LlamaIndex (Python) - Portkey Docs

LlamaIndex provides a framework for building LLM applications with your data. Add Portkey to get production-grade features: full observability, automatic fallbacks, semantic caching, and cost controls—all without changing your LlamaIndex code.

Quick Start

Add Portkey to any LlamaIndex app with 3 parameters:

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="@openai-prod/gpt-4o",        # Provider slug from Model Catalog
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"           # Your Portkey API key
)

response = llm.complete("Tell me a joke")
print(response.text)

All requests now appear in Portkey logs

That’s it! You now get:

✅ Full observability (costs, latency, logs)
✅ Dynamic model selection per request
✅ Automatic fallbacks and retries (via configs)
✅ Budget controls per team/project

Why Add Portkey to LlamaIndex?

LlamaIndex handles data indexing and querying. Portkey adds production features:

Enterprise Observability

Every request logged with costs, latency, tokens. Team-level analytics and debugging.

Dynamic Model Selection

Switch models per request. Route simple queries to cheap models, complex to advanced—automatically tracked.

Production Reliability

Automatic fallbacks, smart retries, load balancing—configured once, works everywhere.

Cost & Access Control

Budget limits per team/project. Rate limiting. Centralized credential management.

Setup

1. Install Packages

pip install llama-index-llms-openai portkey-ai

2. Add Provider in Model Catalog

Go to Model Catalog → Add Provider
Select your provider (OpenAI, Anthropic, Google, etc.)
Choose existing credentials or create new by entering your API keys
Name your provider (e.g., openai-prod)

Your provider slug will be @openai-prod (or whatever you named it).

Complete Model Catalog Guide →

Set up budgets, rate limits, and manage credentials

3. Get Portkey API Key

Create your Portkey API key at app.portkey.ai/api-keys

4. Use in Your Code

Replace your existing LLM initialization:

# Before (direct to OpenAI)
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o",
    api_key="OPENAI_API_KEY"
)

# After (via Portkey)
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

That’s the only change needed! All your existing LlamaIndex code (indexes, query engines, agents) works exactly the same.

Switching Between Providers

Just change the model string—everything else stays the same:

from llama_index.llms.openai import OpenAI

# OpenAI
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

# Anthropic
llm = OpenAI(
    model="@anthropic-prod/claude-sonnet-4",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

# Google Gemini
llm = OpenAI(
    model="@google-prod/gemini-2.0-flash",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

Portkey implements OpenAI-compatible APIs for all providers, so you always use llama_index.llms.openai.OpenAI regardless of which model you’re calling.

Using with LlamaIndex Chat

LlamaIndex’s chat interface works seamlessly:

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What is the capital of France?")
]

response = llm.chat(messages)
print(response.message.content)

Works With All LlamaIndex Features

✅ Query Engines - All query types supported
✅ Chat Engines - Conversational interfaces
✅ Agents - Full agent compatibility
✅ Streaming - Token-by-token streaming
✅ RAG Pipelines - Retrieval-augmented generation
✅ Workflows - Complex LLM workflows

Streaming

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

# Stream completions
for chunk in llm.stream_complete("Write a short story"):
    print(chunk.delta, end="", flush=True)

# Stream chat
messages = [ChatMessage(role="user", content="Tell me a joke")]
for chunk in llm.stream_chat(messages):
    print(chunk.delta, end="", flush=True)

Async Support

import asyncio
from llama_index.llms.openai import OpenAI

async def main():
    llm = OpenAI(
        model="@openai-prod/gpt-4o",
        api_base="https://api.portkey.ai",
        api_key="PORTKEY_API_KEY"
    )
    
    # Async completion
    response = await llm.acomplete("What is 2+2?")
    print(response.text)

    # Async streaming
    async for chunk in await llm.astream_complete("Write a haiku"):
        print(chunk.delta, end="", flush=True)

asyncio.run(main())

RAG with Query Engine

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# Set up LLM with Portkey
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

# Load and index documents
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Query with Portkey-enabled LLM
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What is the main topic?")
print(response)

Advanced Features via Configs

For production features like fallbacks, caching, and load balancing, use Portkey Configs:

from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

llm = OpenAI(
    model="gpt-4o",  # Default model
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        config="pc_your_config_id"  # Created in Portkey dashboard
    )
)

Example: Fallbacks

from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {"override_params": {"model": "@openai-prod/gpt-4o"}},
        {"override_params": {"model": "@anthropic-prod/claude-sonnet-4"}}
    ]
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        config=config
    )
)

# Automatically falls back to Anthropic if OpenAI fails
response = llm.complete("Hello!")

Example: Load Balancing

config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {"override_params": {"model": "@openai-prod/gpt-4o"}, "weight": 0.5},
        {"override_params": {"model": "@anthropic-prod/claude-sonnet-4"}, "weight": 0.5}
    ]
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        config=config
    )
)

# Requests distributed 50/50 between OpenAI and Anthropic
response = llm.complete("Hello!")

Example: Caching

config = {
    "cache": {
        "mode": "semantic",  # or "simple" for exact matches
        "max_age": 3600      # Cache for 1 hour
    },
    "override_params": {"model": "@openai-prod/gpt-4o"}
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        config=config
    )
)

# Responses cached for similar queries
response = llm.complete("What is machine learning?")

Learn About Configs →

Set up fallbacks, retries, caching, load balancing, and more

Observability

Portkey automatically logs all requests. Add custom metadata for better analytics:

from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        metadata={
            "_user": "user_123",
            "environment": "production",
            "feature": "rag_query"
        },
        trace_id="unique_trace_id"
    )
)

Filter and analyze logs by metadata in the Portkey dashboard.

Observability Guide →

Track costs, performance, and debug issues

Prompt Management

Use prompts from Portkey’s Prompt Library:

from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey

# Render prompt from Portkey
client = Portkey(api_key="PORTKEY_API_KEY")
prompt_template = client.prompts.render(
    prompt_id="pp-your-prompt-id",
    variables={"topic": "AI"}
).data.dict()

# Use with LlamaIndex
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(api_key="PORTKEY_API_KEY")
)

messages = [
    ChatMessage(content=msg["content"], role=msg["role"]) 
    for msg in prompt_template["messages"]
]

response = llm.chat(messages)
print(response.message.content)

Prompt Library →

Manage, version, and test prompts in Portkey

Migration from Direct OpenAI

Already using LlamaIndex with OpenAI? Just update 3 parameters:

# Before
from llama_index.llms.openai import OpenAI
import os

llm = OpenAI(
    model="gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY"),
    temperature=0.7
)

# After (add 2 parameters, change 1)
llm = OpenAI(
    model="@openai-prod/gpt-4o",          # Add provider slug
    api_base="https://api.portkey.ai",     # Add this
    api_key="PORTKEY_API_KEY",             # Change to Portkey key
    temperature=0.7                         # Keep existing params
)

Benefits:

Zero code changes to your existing LlamaIndex logic
Instant observability for all requests
Production-grade reliability features
Cost controls and budgets

Next Steps

Model Catalog

Set up providers, budgets, and access control

Configs

Configure fallbacks, caching, and routing

Observability

Track costs, performance, and usage

Guardrails

Add PII detection and content filtering

For complete SDK documentation:

SDK Reference

Complete Portkey SDK documentation

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

MCP Clients

MCP Servers

​Quick Start

​Why Add Portkey to LlamaIndex?

Enterprise Observability

Dynamic Model Selection

Production Reliability

Cost & Access Control

​Setup

​1. Install Packages

​2. Add Provider in Model Catalog

Complete Model Catalog Guide →

​3. Get Portkey API Key

​4. Use in Your Code

​Switching Between Providers

​Using with LlamaIndex Chat

​Works With All LlamaIndex Features

​Streaming

​Async Support

​RAG with Query Engine

​Advanced Features via Configs

​Example: Fallbacks

​Example: Load Balancing

​Example: Caching

Learn About Configs →

​Observability

Observability Guide →

​Prompt Management

Prompt Library →

​Migration from Direct OpenAI

​Next Steps

Model Catalog

Configs

Observability

Guardrails

SDK Reference

Quick Start

Why Add Portkey to LlamaIndex?

Setup

1. Install Packages

2. Add Provider in Model Catalog

3. Get Portkey API Key

4. Use in Your Code

Switching Between Providers

Using with LlamaIndex Chat

Works With All LlamaIndex Features

Streaming

Async Support

RAG with Query Engine

Advanced Features via Configs

Example: Fallbacks

Example: Load Balancing

Example: Caching

Observability

Prompt Management

Migration from Direct OpenAI

Next Steps