Rotary Positional Embeddings

Because Transformers need to know token positions in sequences. RoPE injects positional information by rotating query and key vectors by angles that depend on their position.

For token at position i and query/key dimension pair 2k-1:2k, rotate by angle:

θ_{i, k} = \frac{i}{Θ ^{(2 k - 2) / d}}

Where $i$ is the token position, $θ$ is a constant (like 10000), and $d$ is the dimension of query/key vectors.

This is the rotation matrix for each position $i$ and dimension pair $k$ :

R_{k}^{i} = [cos (θ_{i, k}) sin (θ_{i, k}) - sin (θ_{i, k}) cos (θ_{i, k})]

9/15/25

vishwiki

Explorer

Rotary Positional Embeddings

Graph View

Backlinks