#positional-encoding

1 results

tech Explainer

How Does a Transformer Know Word Order? From Absolute Encoding to RoPE

Transformer self-attention is inherently orderless — positional encoding is the fix. From sinusoidal absolute encoding, to learnable absolute encoding, to relative positional encoding, to RoPE (Rotary Position Embedding): modern LLMs almost universally use RoPE because it requires no parameters, naturally encodes relative distances, and can be extended to longer sequences.