Context Caching: How to Slash Your LLM Bill by 50%
As multimodal models expand their context windows (up to 2 million tokens in Gemini 1.5 Pro), passing massive documents or hour-long videos has become possible. However, sending the same massive file with every API call is financially unsustainable.
Enter Context Caching.
What is Context Caching?
Context caching allows you to upload a large payload (like a PDF, a video, or a massive codebase) to the API provider's servers once. The provider processes the tokens and stores them in memory.
For subsequent API calls, you simply reference the cached context.
The Cost Benefit
Providers typically charge a fraction of the standard input token rate for cached tokens.
- Standard Input: ~$5.00 per 1M tokens.
- Cached Input: ~$1.25 per 1M tokens.
You pay a small hourly storage fee for keeping the cache alive, making this feature strictly for high-frequency, repetitive queries against static data.
If you are building RAG systems or analyzing long videos, caching is mandatory for enterprise viability. Check your baseline un-cached costs on our Pricing Calculator.