[Preview] v1.80.7-stable - RAG API
Deploy this version
- Docker
- Pip
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.80.7
pip install litellm==1.80.7
Key Highlights
- New RAG API -Unified RAG API with support for Vertex AI RAG engine and OpenAI Vector Stores
- Claude Skills API - Support for Anthropic's new Skills API with extended context and tool calling
- Organization Usage - Filter and track usage analytics at the organization level
- Claude Opus 4.5 - Support for Anthropic's Claude Opus 4.5 via Anthropic, Bedrock, VertexAI
- Guardrails for Passthrough - Guardrails support for pass-through endpoints
- Public AI Provider - Support for publicai.co provider
RAG API
Introducing a new RAG API on LiteLLM AI Gateway. You can provide documents (TXT, PDF, DOCX files) to LiteLLM's all-in-one document ingestion pipeline and it will handle OCR recognition, chunking, embedding, and storing data in your vector store of choice (OpenAI, Bedrock, Vertex AI, etc.).
Example usage for ingestion
curl -X POST "http://localhost:4000/v1/rag/ingest" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d "{
\"file\": {
\"filename\": \"document.txt\",
\"content\": \"$(base64 -i document.txt)\",
\"content_type\": \"text/plain\"
},
\"ingest_options\": {
\"vector_store\": {
\"custom_llm_provider\": \"bedrock\"
}
}
}"
Example usage for querying the vector store
curl -X POST "http://localhost:4000/v1/vector_stores/vs_692658d337c4819183f2ad8488d12fc9/search" \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"query": "What is LiteLLM?",
"custom_llm_provider": "bedrock"
}'
Organization Usage
Users can now filter usage statistics by organization, providing the same granular filtering capabilities available for teams.
Details:
- Filter usage analytics, spend logs, and activity metrics by organization ID
- View organization-level breakdowns alongside existing team and user-level filters
- Consistent filtering experience across all usage and analytics views
New Providers and Endpoints
New Providers
| Provider | Supported Endpoints | Description |
|---|---|---|
| Public AI | Chat completions | Support for publicai.co provider |
| Eleven Labs | Text-to-speech | Text-to-speech provider integration |
New LLM API Endpoints
| Endpoint | Method | Description | Documentation |
|---|---|---|---|
/v1/skills | POST | Anthropic Skills API for extended context tool calling | Skills API |
/rag/ingest | POST | Unified RAG API with Vertex AI RAG and Vector Stores | RAG API |
New Models / Updated Models
New Model Support
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| Anthropic | claude-opus-4-5-20251101 | 200K | $5.00 | $25.00 | Chat, reasoning, vision, function calling, prompt caching |
| Bedrock | anthropic.claude-opus-4-5-20251101-v1:0 | 200K | $5.00 | $25.00 | Chat, reasoning, vision, function calling, prompt caching |
| Bedrock | us.anthropic.claude-opus-4-5-20251101-v1:0 | 200K | $5.00 | $25.00 | Chat, reasoning, vision, function calling, prompt caching |
| Bedrock | amazon.nova-canvas-v1:0 | - | - | $0.06/image | Image generation |
| OpenRouter | openrouter/anthropic/claude-opus-4.5 | 200K | $5.00 | $25.00 | Chat, reasoning, vision, function calling, prompt caching |
| Vertex AI | vertex_ai/claude-opus-4-5 | 200K | $5.00 | $25.00 | Chat, reasoning, vision, function calling, prompt caching |
| Vertex AI | vertex_ai/claude-opus-4-5@20251101 | 200K | $5.00 | $25.00 | Chat, reasoning, vision, function calling, prompt caching |
| Azure | azure_ai/claude-opus-4-1 | 200K | $15.00 | $75.00 | Chat, reasoning, vision, function calling, prompt caching |
| Azure | azure_ai/claude-sonnet-4-5 | 200K | $3.00 | $15.00 | Chat, reasoning, vision, function calling, prompt caching |
| Azure | azure_ai/claude-haiku-4-5 | 200K | $1.00 | $5.00 | Chat, reasoning, vision, function calling, prompt caching |
| Fireworks AI | fireworks_ai/accounts/fireworks/models/glm-4p6 | 202K | $0.55 | $2.19 | Chat, function calling |
| Public AI | publicai/swiss-ai/apertus-8b-instruct | 8K | Free | Free | Chat, function calling |
| Public AI | publicai/swiss-ai/apertus-70b-instruct | 8K | Free | Free | Chat, function calling |
| Public AI | publicai/aisingapore/Gemma-SEA-LION-v4-27B-IT | 8K | Free | Free | Chat, function calling |
| Public AI | publicai/BSC-LT/salamandra-7b-instruct-tools-16k | 16K | Free | Free | Chat, function calling |
| Public AI | publicai/BSC-LT/ALIA-40b-instruct_Q8_0 | 8K | Free | Free | Chat, function calling |
| Public AI | publicai/allenai/Olmo-3-7B-Instruct | 32K | Free | Free | Chat, function calling |
| Public AI | publicai/aisingapore/Qwen-SEA-LION-v4-32B-IT | 32K | Free | Free | Chat, function calling |
| Public AI | publicai/allenai/Olmo-3-7B-Think | 32K | Free | Free | Chat, function calling, reasoning |
| Public AI | publicai/allenai/Olmo-3-32B-Think | 32K | Free | Free | Chat, function calling, reasoning |
| Cohere | embed-multilingual-light-v3.0 | 1K | $0.10 | - | Embeddings, supports images |
| WatsonX | watsonx/whisper-large-v3-turbo | - | $0.0001/sec | - | Audio transcription |
Features
-
- Add OpenRouter Opus 4.5 - PR #17144
-
- Add fireworks_ai/accounts/fireworks/models/glm-4p6 - PR #17154
-
- Add vertex ai image gen support for both gemini and imagen models - PR #17070
- Handle global location in context caching - PR #16997
- Fix CreateCachedContentRequest enum error - PR #16965
- Use the correct domain for the global location when counting tokens - PR #17116
- Support Vertex AI batch listing in LiteLLM proxy - PR #17079
- Fix default sample count for image generation - PR #16403
-
- Add audio transcriptions for WatsonX - PR #17160
-
- Fix gpt-5.1 temperature support when reasoning_effort is "none" or not specified - PR #17011
-
- Add Provider publicai.co - PR #17230
-
- Add cost tracking for cohere embed passthrough endpoint - PR #17029
-
- Integrate eleven labs text-to-speech - PR #16573
Bug Fixes
LLM API Endpoints
Features
-
- Add search API logging and cost tracking in LiteLLM Proxy - PR #17078
-
- Fix prevent duplicate spend logs in Responses API for non-OpenAI providers - PR #16992
- Support response_format parameter in completion -> responses bridge - PR #16844
- Fix mcp tool call response logging + remove unmapped param error mid-stream - allows gpt-5 web search to work via responses api - PR #16946
- Add header passing support for MCP tools in Responses API - PR #16877
-
- Fix image edit endpoint - PR #17046
-
- Add header forwarding in embeddings - PR #16869
-
- Add method for extracting vector store ids from path params - PR #16566
-
General
Bugs
- General
Management Endpoints / UI
Features
-
Virtual Keys
- Fix Create Key Duration - PR #17170
-
Models + Endpoints
- Allow adding Bedrock API Key when adding models - PR #17153
- Add aws_bedrock_runtime_endpoint into Credential Types - PR #17053
- Change provider create fields to JSON - PR #16985
- Change model_hub_table to call getUiConfig before Fetching Public Data - PR #17166
- Improve Wording for Config Models in Model Table - PR #17100
-
Teams & Users
- Deleting a User From Team Deletes key User Created for Team - PR #17057
- Hide Default Team Settings From Proxy Admin Viewers - PR #16900
- Add No Default Models for Team and User Settings - PR #17037
- User Table Sort by All - PR #17108
- Org Admin Team Permissions Fix - PR #17110
- Better Loading State for Internal User Page - PR #17168
-
Permission Management
-
MCP Gateway
-
General UI Improvements
- Ensure Unique Keys in Navbar Menu Items - PR #16987
- Minor Cosmetic Changes for Buttons, Add Notification for Delete Team - PR #16984
- Change Delete Modals to Common Component - PR #17068
- Disable edit, delete, info for dynamically generated spend tags - PR #17098
- Migrate modelInfoCall to ReactQuery - PR #17123
- Migrate Provider Fields to React Query - PR #17177
- Fix Flaky Test - PR #17161
- Change Add Fallback Modal to use Antd Select - PR #17223
-
Infrastructure
-
Helm
- Enhancement: ServiceMonitor template rendering - PR #17038
Bugs
- Database
- Distinguish permission errors from idempotent errors in Prisma migrations - PR #17064
AI Integrations
Logging
- General
Guardrails
-
- Add presidio pii masking tutorial with litellm - PR #16969
-
General
Prompt Management
- General
- AI gateway prompt management documentation - PR #16990
MCP Gateway
-
OAuth 2.0
-
Tool Permissions
-
Configuration
Performance / Loadbalancing / Reliability improvements
-
Memory Optimization
- Lazy-load cost_calculator & logging to reduce memory + import time - PR #17089
-
Dependency Management
-
Database Performance
- Optimize date filtering for spend logs queries - PR #17073
-
Request Handling
- Add automatic LiteLLM context headers (Pillar integration) - PR #17076
-
Generic API Support
- Make generic api OSS + support multiple generic API's - PR #17152
Documentation Updates
-
Provider Documentation
-
General Documentation
- AI gateway prompt management - PR #16990
- Cleanup README and improve agent guides - PR #17003
- Update broken documentation links in README - PR #17002
- Update version and add preview tag - PR #17032
- Document model pricing contribution process - PR #17031
- Document event hook usage - PR #17035
- Link to logging spec in callback docs - PR #17049
- Add OpenAI Agents SDK to projects - PR #17203
New Contributors
- @prawaan made their first contribution in PR #16997
- @lior-ps made their first contribution in PR #16365
- @HaiyiMei made their first contribution in PR #17020
- @yuya2017 made their first contribution in PR #17064
- @saar-win made their first contribution in PR #17038
- @sdip15fa made their first contribution in PR #16965
- @KeremTurgutlu made their first contribution in PR #16826
- @choigawoon made their first contribution in PR #17019
- @SamAcctX made their first contribution in PR #17144
- @naaa760 made their first contribution in PR #17079
- @abi-jey made their first contribution in PR #17096
- @hxyannay made their first contribution in PR #16734

