#transformer

3 results

tech Deep Dive

May 13, 2026

Designing a Sora-Scale Text-to-Video System

Sora's core architecture is a Diffusion Transformer (DiT): compress video into spatiotemporal patch tokens, train a diffusion model to denoise them, with the Transformer handling global coherence. The real engineering challenges are temporal consistency, variable-length/resolution support, and training scale.

#sora #text-to-video #diffusion-models #transformer #ai-generation #system-design

tech Deep Dive

May 10, 2026

KV Cache: The Most Critical Optimization in LLM Inference

KV Cache reduces autoregressive Transformer generation from O(n²) — recomputing the full sequence for every new token — to O(n) per step, which is the core reason modern LLM inference is fast enough to be usable.

#kv-cache #llm #inference-optimization #transformer #ai #machine-learning

tech Explainer

May 10, 2026

How Does a Transformer Know Word Order? From Absolute Encoding to RoPE

Transformer self-attention is inherently orderless — positional encoding is the fix. From sinusoidal absolute encoding, to learnable absolute encoding, to relative positional encoding, to RoPE (Rotary Position Embedding): modern LLMs almost universally use RoPE because it requires no parameters, naturally encodes relative distances, and can be extended to longer sequences.

#transformer #rope #positional-encoding #nlp #machine-learning #deep-learning