LlamaIndex provides a framework for building LLM applications with your data. Add Portkey to get production-grade features: full observability, automatic fallbacks, semantic caching, and cost controls—all without changing your LlamaIndex code.
Quick Start
Add Portkey to any LlamaIndex app with 3 parameters:
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model = "@openai-prod/gpt-4o" , # Provider slug from Model Catalog
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY" # Your Portkey API key
)
response = llm.complete( "Tell me a joke" )
print (response.text)
All requests now appear in Portkey logs
That’s it! You now get:
✅ Full observability (costs, latency, logs)
✅ Dynamic model selection per request
✅ Automatic fallbacks and retries (via configs)
✅ Budget controls per team/project
Why Add Portkey to LlamaIndex?
LlamaIndex handles data indexing and querying. Portkey adds production features:
Enterprise Observability Every request logged with costs, latency, tokens. Team-level analytics and debugging.
Dynamic Model Selection Switch models per request. Route simple queries to cheap models, complex to advanced—automatically tracked.
Production Reliability Automatic fallbacks, smart retries, load balancing—configured once, works everywhere.
Cost & Access Control Budget limits per team/project. Rate limiting. Centralized credential management.
Setup
1. Install Packages
pip install llama-index-llms-openai portkey-ai
2. Add Provider in Model Catalog
Go to Model Catalog → Add Provider
Select your provider (OpenAI, Anthropic, Google, etc.)
Choose existing credentials or create new by entering your API keys
Name your provider (e.g., openai-prod)
Your provider slug will be @openai-prod (or whatever you named it).
Complete Model Catalog Guide → Set up budgets, rate limits, and manage credentials
3. Get Portkey API Key
Create your Portkey API key at app.portkey.ai/api-keys
4. Use in Your Code
Replace your existing LLM initialization:
# Before (direct to OpenAI)
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model = "gpt-4o" ,
api_key = "OPENAI_API_KEY"
)
# After (via Portkey)
llm = OpenAI(
model = "@openai-prod/gpt-4o" ,
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY"
)
That’s the only change needed! All your existing LlamaIndex code (indexes, query engines, agents) works exactly the same.
Switching Between Providers
Just change the model string—everything else stays the same:
from llama_index.llms.openai import OpenAI
# OpenAI
llm = OpenAI(
model = "@openai-prod/gpt-4o" ,
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY"
)
# Anthropic
llm = OpenAI(
model = "@anthropic-prod/claude-sonnet-4" ,
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY"
)
# Google Gemini
llm = OpenAI(
model = "@google-prod/gemini-2.0-flash" ,
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY"
)
Portkey implements OpenAI-compatible APIs for all providers, so you always use llama_index.llms.openai.OpenAI regardless of which model you’re calling.
Using with LlamaIndex Chat
LlamaIndex’s chat interface works seamlessly:
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
llm = OpenAI(
model = "@openai-prod/gpt-4o" ,
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY"
)
messages = [
ChatMessage( role = "system" , content = "You are a helpful assistant" ),
ChatMessage( role = "user" , content = "What is the capital of France?" )
]
response = llm.chat(messages)
print (response.message.content)
Works With All LlamaIndex Features
✅ Query Engines - All query types supported
✅ Chat Engines - Conversational interfaces
✅ Agents - Full agent compatibility
✅ Streaming - Token-by-token streaming
✅ RAG Pipelines - Retrieval-augmented generation
✅ Workflows - Complex LLM workflows
Streaming
from llama_index.llms.openai import OpenAI
llm = OpenAI(
model = "@openai-prod/gpt-4o" ,
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY"
)
# Stream completions
for chunk in llm.stream_complete( "Write a short story" ):
print (chunk.delta, end = "" , flush = True )
# Stream chat
messages = [ChatMessage( role = "user" , content = "Tell me a joke" )]
for chunk in llm.stream_chat(messages):
print (chunk.delta, end = "" , flush = True )
Async Support
import asyncio
from llama_index.llms.openai import OpenAI
async def main ():
llm = OpenAI(
model = "@openai-prod/gpt-4o" ,
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY"
)
# Async completion
response = await llm.acomplete( "What is 2+2?" )
print (response.text)
# Async streaming
async for chunk in await llm.astream_complete( "Write a haiku" ):
print (chunk.delta, end = "" , flush = True )
asyncio.run(main())
RAG with Query Engine
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
# Set up LLM with Portkey
llm = OpenAI(
model = "@openai-prod/gpt-4o" ,
api_base = "https://api.portkey.ai" ,
api_key = "PORTKEY_API_KEY"
)
# Load and index documents
documents = SimpleDirectoryReader( "data" ).load_data()
index = VectorStoreIndex.from_documents(documents)
# Query with Portkey-enabled LLM
query_engine = index.as_query_engine( llm = llm)
response = query_engine.query( "What is the main topic?" )
print (response)
Advanced Features via Configs
For production features like fallbacks, caching, and load balancing, use Portkey Configs:
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL , createHeaders
llm = OpenAI(
model = "gpt-4o" , # Default model
api_base = PORTKEY_GATEWAY_URL ,
api_key = "placeholder" ,
default_headers = createHeaders(
api_key = "PORTKEY_API_KEY" ,
config = "pc_your_config_id" # Created in Portkey dashboard
)
)
Example: Fallbacks
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL , createHeaders
config = {
"strategy" : { "mode" : "fallback" },
"targets" : [
{ "override_params" : { "model" : "@openai-prod/gpt-4o" }},
{ "override_params" : { "model" : "@anthropic-prod/claude-sonnet-4" }}
]
}
llm = OpenAI(
model = "gpt-4o" ,
api_base = PORTKEY_GATEWAY_URL ,
api_key = "placeholder" ,
default_headers = createHeaders(
api_key = "PORTKEY_API_KEY" ,
config = config
)
)
# Automatically falls back to Anthropic if OpenAI fails
response = llm.complete( "Hello!" )
Example: Load Balancing
config = {
"strategy" : { "mode" : "loadbalance" },
"targets" : [
{ "override_params" : { "model" : "@openai-prod/gpt-4o" }, "weight" : 0.5 },
{ "override_params" : { "model" : "@anthropic-prod/claude-sonnet-4" }, "weight" : 0.5 }
]
}
llm = OpenAI(
model = "gpt-4o" ,
api_base = PORTKEY_GATEWAY_URL ,
api_key = "placeholder" ,
default_headers = createHeaders(
api_key = "PORTKEY_API_KEY" ,
config = config
)
)
# Requests distributed 50/50 between OpenAI and Anthropic
response = llm.complete( "Hello!" )
Example: Caching
config = {
"cache" : {
"mode" : "semantic" , # or "simple" for exact matches
"max_age" : 3600 # Cache for 1 hour
},
"override_params" : { "model" : "@openai-prod/gpt-4o" }
}
llm = OpenAI(
model = "gpt-4o" ,
api_base = PORTKEY_GATEWAY_URL ,
api_key = "placeholder" ,
default_headers = createHeaders(
api_key = "PORTKEY_API_KEY" ,
config = config
)
)
# Responses cached for similar queries
response = llm.complete( "What is machine learning?" )
Learn About Configs → Set up fallbacks, retries, caching, load balancing, and more
Observability
Portkey automatically logs all requests. Add custom metadata for better analytics:
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL , createHeaders
llm = OpenAI(
model = "@openai-prod/gpt-4o" ,
api_base = PORTKEY_GATEWAY_URL ,
api_key = "placeholder" ,
default_headers = createHeaders(
api_key = "PORTKEY_API_KEY" ,
metadata = {
"_user" : "user_123" ,
"environment" : "production" ,
"feature" : "rag_query"
},
trace_id = "unique_trace_id"
)
)
Filter and analyze logs by metadata in the Portkey dashboard.
Observability Guide → Track costs, performance, and debug issues
Prompt Management
Use prompts from Portkey’s Prompt Library:
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL , createHeaders, Portkey
# Render prompt from Portkey
client = Portkey( api_key = "PORTKEY_API_KEY" )
prompt_template = client.prompts.render(
prompt_id = "pp-your-prompt-id" ,
variables = { "topic" : "AI" }
).data.dict()
# Use with LlamaIndex
llm = OpenAI(
model = "@openai-prod/gpt-4o" ,
api_base = PORTKEY_GATEWAY_URL ,
api_key = "placeholder" ,
default_headers = createHeaders( api_key = "PORTKEY_API_KEY" )
)
messages = [
ChatMessage( content = msg[ "content" ], role = msg[ "role" ])
for msg in prompt_template[ "messages" ]
]
response = llm.chat(messages)
print (response.message.content)
Prompt Library → Manage, version, and test prompts in Portkey
Migration from Direct OpenAI
Already using LlamaIndex with OpenAI? Just update 3 parameters:
# Before
from llama_index.llms.openai import OpenAI
import os
llm = OpenAI(
model = "gpt-4o" ,
api_key = os.getenv( "OPENAI_API_KEY" ),
temperature = 0.7
)
# After (add 2 parameters, change 1)
llm = OpenAI(
model = "@openai-prod/gpt-4o" , # Add provider slug
api_base = "https://api.portkey.ai" , # Add this
api_key = "PORTKEY_API_KEY" , # Change to Portkey key
temperature = 0.7 # Keep existing params
)
Benefits:
Zero code changes to your existing LlamaIndex logic
Instant observability for all requests
Production-grade reliability features
Cost controls and budgets
Next Steps
For complete SDK documentation:
SDK Reference Complete Portkey SDK documentation