Because Transformers need to know token positions in sequences. RoPE injects positional information by rotating query and key vectors by angles that depend on their position.

For token at position i and query/key dimension pair 2k-1:2k, rotate by angle:

Where is the token position, is a constant (like 10000), and is the dimension of query/key vectors.

This is the rotation matrix for each position and dimension pair :

9/15/25