Documentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
LiteLLM is a library that simplifies calling Anthropic, Azure, Huggingface, Replicate, etc.
This page covers how to get started using LangChain with the LiteLLM I/O library.
This integration provides two chat model classes:
- ChatLiteLLM: The main LangChain chat wrapper for LiteLLM.
- ChatLiteLLMRouter: A
ChatLiteLLM wrapper that leverages LiteLLM’s Router for load balancing and fallbacks.
The package also ships LiteLLMEmbeddings, LiteLLMEmbeddingsRouter, and LiteLLMOCRLoader. See the providers page for details.
Overview
Integration details
Model features
Setup
To access ChatLiteLLM and ChatLiteLLMRouter models, you’ll need to install the langchain-litellm package and create an OpenAI, Anthropic, Azure, Replicate, OpenRouter, Hugging Face, Together AI, or Cohere account. Then, you have to get an API key and export it as an environment variable.
Credentials
You have to choose the LLM provider you want and sign up with them to get their API key.
Example - Anthropic
Head to the Claude console to sign up and generate a Claude API key. Once you’ve done this set the ANTHROPIC_API_KEY environment variable:
Example - OpenAI
Head to platform.openai.com/api-keys to sign up for OpenAI and generate an API key. Once you’ve done this, set the OPENAI_API_KEY environment variable.
## Set ENV variables
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
Installation
The LangChain LiteLLM integration is available in the langchain-litellm package:
pip install -qU langchain-litellm
Instantiation
ChatLiteLLM
You can instantiate a ChatLiteLLM model by providing a model name supported by LiteLLM.
from langchain_litellm import ChatLiteLLM
llm = ChatLiteLLM(model="gpt-5.4-nano", temperature=0.1)
ChatLiteLLMRouter
You can also leverage LiteLLM’s routing capabilities by defining your model list as specified in the LiteLLM routing documentation.
from langchain_litellm import ChatLiteLLMRouter
from litellm import Router
model_list = [
{
"model_name": "gpt-5.4",
"litellm_params": {
"model": "azure/gpt-5.4",
"api_key": "<your-api-key>",
"api_version": "2024-10-21",
"api_base": "https://<your-endpoint>.openai.azure.com/",
},
},
{
"model_name": "gpt-5.4",
"litellm_params": {
"model": "azure/gpt-5.4",
"api_key": "<your-api-key>",
"api_version": "2024-10-21",
"api_base": "https://<your-endpoint>.openai.azure.com/",
},
},
]
litellm_router = Router(model_list=model_list)
llm = ChatLiteLLMRouter(router=litellm_router, model_name="gpt-5.4", temperature=0.1)
Invocation
Whether you’ve instantiated a ChatLiteLLM or a ChatLiteLLMRouter, you can now use the ChatModel through LangChain’s API.
response = await llm.ainvoke(
"Classify the text into neutral, negative or positive. Text: I think the food was okay. Sentiment:"
)
print(response)
content='Neutral' additional_kwargs={} response_metadata={'token_usage': Usage(completion_tokens=2, prompt_tokens=30, total_tokens=32, completion_tokens_details=CompletionTokensDetailsWrapper(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0, text_tokens=None), prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=0, cached_tokens=0, text_tokens=None, image_tokens=None)), 'model': 'gpt-3.5-turbo', 'finish_reason': 'stop', 'model_name': 'gpt-3.5-turbo'} id='run-ab6a3b21-eae8-4c27-acb2-add65a38221a-0' usage_metadata={'input_tokens': 30, 'output_tokens': 2, 'total_tokens': 32}
Async and streaming functionality
ChatLiteLLM and ChatLiteLLMRouter also support async and streaming functionality:
async for token in llm.astream("Hello, please explain how antibiotics work"):
print(token.text(), end="")
Antibiotics are medications that fight bacterial infections in the body. They work by targeting specific bacteria and either killing them or preventing their growth and reproduction.
There are several different mechanisms by which antibiotics work. Some antibiotics work by disrupting the cell walls of bacteria, causing them to burst and die. Others interfere with the protein synthesis of bacteria, preventing them from growing and reproducing. Some antibiotics target the DNA or RNA of bacteria, disrupting their ability to replicate.
It is important to note that antibiotics only work against bacterial infections and not viral infections. It is also crucial to take antibiotics as prescribed by a healthcare professional and to complete the full course of treatment, even if symptoms improve before the medication is finished. This helps to prevent antibiotic resistance, where bacteria become resistant to the effects of antibiotics.
Advanced features
Vertex AI grounding (Google Search)
Use Google Search grounding with Vertex AI models (e.g., gemini-3.5-flash). Citations and metadata are returned in response_metadata (batch) or additional_kwargs (streaming).
Setup
import os
from langchain_litellm import ChatLiteLLM
os.environ["VERTEX_PROJECT"] = "your-project-id"
os.environ["VERTEX_LOCATION"] = "us-central1"
llm = ChatLiteLLM(model="vertex_ai/gemini-2.5-flash", temperature=0)
Batch usage
# Invoke with Google Search tool enabled
response = llm.invoke(
"What is the current stock price of Google?",
tools=[{"googleSearch": {}}]
)
# Access citations & metadata
provider_fields = response.response_metadata.get("provider_specific_fields")
if provider_fields:
# Vertex returns a list; the first item contains the grounding info
print(provider_fields[0])
Streaming usage
stream = llm.stream(
"What is the current stock price of Google?",
tools=[{"googleSearch": {}}]
)
for chunk in stream:
print(chunk.content, end="", flush=True)
# Metadata is injected into the chunk where it arrives
if "provider_specific_fields" in chunk.additional_kwargs:
print("\n[Metadata Found]:", chunk.additional_kwargs["provider_specific_fields"])
API reference
For detailed documentation of all ChatLiteLLM and ChatLiteLLMRouter features and configurations, see the langchain-litellm API reference.