Skip to main content
LlamaIndex provides a framework for building LLM applications with your data. Add Portkey to get production-grade features: full observability, automatic fallbacks, semantic caching, and cost controls—all without changing your LlamaIndex code.

Quick Start

Add Portkey to any LlamaIndex app with 3 parameters:
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="@openai-prod/gpt-4o",        # Provider slug from Model Catalog
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"           # Your Portkey API key
)

response = llm.complete("Tell me a joke")
print(response.text)

All requests now appear in Portkey logs

That’s it! You now get:
  • ✅ Full observability (costs, latency, logs)
  • ✅ Dynamic model selection per request
  • ✅ Automatic fallbacks and retries (via configs)
  • ✅ Budget controls per team/project

Why Add Portkey to LlamaIndex?

LlamaIndex handles data indexing and querying. Portkey adds production features:

Enterprise Observability

Every request logged with costs, latency, tokens. Team-level analytics and debugging.

Dynamic Model Selection

Switch models per request. Route simple queries to cheap models, complex to advanced—automatically tracked.

Production Reliability

Automatic fallbacks, smart retries, load balancing—configured once, works everywhere.

Cost & Access Control

Budget limits per team/project. Rate limiting. Centralized credential management.

Setup

1. Install Packages

pip install llama-index-llms-openai portkey-ai

2. Add Provider in Model Catalog

  1. Go to Model Catalog → Add Provider
  2. Select your provider (OpenAI, Anthropic, Google, etc.)
  3. Choose existing credentials or create new by entering your API keys
  4. Name your provider (e.g., openai-prod)
Your provider slug will be @openai-prod (or whatever you named it).

Complete Model Catalog Guide →

Set up budgets, rate limits, and manage credentials

3. Get Portkey API Key

Create your Portkey API key at app.portkey.ai/api-keys

4. Use in Your Code

Replace your existing LLM initialization:
# Before (direct to OpenAI)
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o",
    api_key="OPENAI_API_KEY"
)

# After (via Portkey)
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)
That’s the only change needed! All your existing LlamaIndex code (indexes, query engines, agents) works exactly the same.

Switching Between Providers

Just change the model string—everything else stays the same:
from llama_index.llms.openai import OpenAI

# OpenAI
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

# Anthropic
llm = OpenAI(
    model="@anthropic-prod/claude-sonnet-4",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

# Google Gemini
llm = OpenAI(
    model="@google-prod/gemini-2.0-flash",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)
Portkey implements OpenAI-compatible APIs for all providers, so you always use llama_index.llms.openai.OpenAI regardless of which model you’re calling.

Using with LlamaIndex Chat

LlamaIndex’s chat interface works seamlessly:
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="What is the capital of France?")
]

response = llm.chat(messages)
print(response.message.content)

Works With All LlamaIndex Features

Query Engines - All query types supported
Chat Engines - Conversational interfaces
Agents - Full agent compatibility
Streaming - Token-by-token streaming
RAG Pipelines - Retrieval-augmented generation
Workflows - Complex LLM workflows

Streaming

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

# Stream completions
for chunk in llm.stream_complete("Write a short story"):
    print(chunk.delta, end="", flush=True)

# Stream chat
messages = [ChatMessage(role="user", content="Tell me a joke")]
for chunk in llm.stream_chat(messages):
    print(chunk.delta, end="", flush=True)

Async Support

import asyncio
from llama_index.llms.openai import OpenAI

async def main():
    llm = OpenAI(
        model="@openai-prod/gpt-4o",
        api_base="https://api.portkey.ai",
        api_key="PORTKEY_API_KEY"
    )
    
    # Async completion
    response = await llm.acomplete("What is 2+2?")
    print(response.text)

    # Async streaming
    async for chunk in await llm.astream_complete("Write a haiku"):
        print(chunk.delta, end="", flush=True)

asyncio.run(main())

RAG with Query Engine

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

# Set up LLM with Portkey
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base="https://api.portkey.ai",
    api_key="PORTKEY_API_KEY"
)

# Load and index documents
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Query with Portkey-enabled LLM
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What is the main topic?")
print(response)

Advanced Features via Configs

For production features like fallbacks, caching, and load balancing, use Portkey Configs:
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

llm = OpenAI(
    model="gpt-4o",  # Default model
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        config="pc_your_config_id"  # Created in Portkey dashboard
    )
)

Example: Fallbacks

from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

config = {
    "strategy": {"mode": "fallback"},
    "targets": [
        {"override_params": {"model": "@openai-prod/gpt-4o"}},
        {"override_params": {"model": "@anthropic-prod/claude-sonnet-4"}}
    ]
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        config=config
    )
)

# Automatically falls back to Anthropic if OpenAI fails
response = llm.complete("Hello!")

Example: Load Balancing

config = {
    "strategy": {"mode": "loadbalance"},
    "targets": [
        {"override_params": {"model": "@openai-prod/gpt-4o"}, "weight": 0.5},
        {"override_params": {"model": "@anthropic-prod/claude-sonnet-4"}, "weight": 0.5}
    ]
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        config=config
    )
)

# Requests distributed 50/50 between OpenAI and Anthropic
response = llm.complete("Hello!")

Example: Caching

config = {
    "cache": {
        "mode": "semantic",  # or "simple" for exact matches
        "max_age": 3600      # Cache for 1 hour
    },
    "override_params": {"model": "@openai-prod/gpt-4o"}
}

llm = OpenAI(
    model="gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        config=config
    )
)

# Responses cached for similar queries
response = llm.complete("What is machine learning?")

Learn About Configs →

Set up fallbacks, retries, caching, load balancing, and more

Observability

Portkey automatically logs all requests. Add custom metadata for better analytics:
from llama_index.llms.openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders

llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(
        api_key="PORTKEY_API_KEY",
        metadata={
            "_user": "user_123",
            "environment": "production",
            "feature": "rag_query"
        },
        trace_id="unique_trace_id"
    )
)
Filter and analyze logs by metadata in the Portkey dashboard.

Observability Guide →

Track costs, performance, and debug issues

Prompt Management

Use prompts from Portkey’s Prompt Library:
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey

# Render prompt from Portkey
client = Portkey(api_key="PORTKEY_API_KEY")
prompt_template = client.prompts.render(
    prompt_id="pp-your-prompt-id",
    variables={"topic": "AI"}
).data.dict()

# Use with LlamaIndex
llm = OpenAI(
    model="@openai-prod/gpt-4o",
    api_base=PORTKEY_GATEWAY_URL,
    api_key="placeholder",
    default_headers=createHeaders(api_key="PORTKEY_API_KEY")
)

messages = [
    ChatMessage(content=msg["content"], role=msg["role"]) 
    for msg in prompt_template["messages"]
]

response = llm.chat(messages)
print(response.message.content)

Prompt Library →

Manage, version, and test prompts in Portkey

Migration from Direct OpenAI

Already using LlamaIndex with OpenAI? Just update 3 parameters:
# Before
from llama_index.llms.openai import OpenAI
import os

llm = OpenAI(
    model="gpt-4o",
    api_key=os.getenv("OPENAI_API_KEY"),
    temperature=0.7
)

# After (add 2 parameters, change 1)
llm = OpenAI(
    model="@openai-prod/gpt-4o",          # Add provider slug
    api_base="https://api.portkey.ai",     # Add this
    api_key="PORTKEY_API_KEY",             # Change to Portkey key
    temperature=0.7                         # Keep existing params
)
Benefits:
  • Zero code changes to your existing LlamaIndex logic
  • Instant observability for all requests
  • Production-grade reliability features
  • Cost controls and budgets

Next Steps

For complete SDK documentation:

SDK Reference

Complete Portkey SDK documentation