The uncompromising intro to KV caching
A basic optimization for autoregressive generation with Transformers
By Francesco Cariaggi
In this blog post, I will try my best to explain key-value caching, or KV caching in short, in a beginner-friendly way, but with enough technical depth to make it enjoyable for non-beginners too. The “uncompromising” part of the title refers to the intention of avoiding shortcuts or simplifications that...
[Read More]