LiteLLM integrations

LiteLLM is a library that simplifies calling Anthropic, Azure, Huggingface, Replicate, and many other LLM providers through a single unified interface.

Installation and setup

pip install langchain-litellm

uv add langchain-litellm

Chat models

from langchain_litellm import ChatLiteLLM

from langchain_litellm import ChatLiteLLMRouter

See the LiteLLM chat guide for full usage details, including streaming, tool calling, structured output, and Vertex AI Grounding.

Embeddings

from langchain_litellm import LiteLLMEmbeddings

from langchain_litellm import LiteLLMEmbeddingsRouter

LiteLLMEmbeddings embeds text across 100+ providers with a single consistent interface. All configuration is explicit, with no environment variables required.

from langchain_litellm import LiteLLMEmbeddings

embeddings = LiteLLMEmbeddings(
    model="openai/text-embedding-3-small",
    api_key="sk-...",
)

vectors = embeddings.embed_documents(["hello", "world"])
query_vector = embeddings.embed_query("hello")

Switch providers by changing model. The interface stays the same:

# Cohere
embeddings = LiteLLMEmbeddings(
    model="cohere/embed-english-v3.0",
    api_key="...",
    document_input_type="search_document",
    query_input_type="search_query",
)

# Azure OpenAI
embeddings = LiteLLMEmbeddings(
    model="azure/my-embedding-deployment",
    api_key="...",
    api_base="https://my-resource.openai.azure.com",
    api_version="2024-02-01",
)

# Bedrock
embeddings = LiteLLMEmbeddings(
    model="bedrock/amazon.titan-embed-text-v1",
)

For load-balancing across multiple deployments of the same model, use LiteLLMEmbeddingsRouter:

from litellm import Router
from langchain_litellm import LiteLLMEmbeddingsRouter

router = Router(model_list=[
    {
        "model_name": "text-embedding-3-small",
        "litellm_params": {
            "model": "openai/text-embedding-3-small",
            "api_key": "sk-key1",
        },
    },
    {
        "model_name": "text-embedding-3-small",
        "litellm_params": {
            "model": "openai/text-embedding-3-small",
            "api_key": "sk-key2",
        },
    },
])

embeddings = LiteLLMEmbeddingsRouter(router=router)

Document loaders

from langchain_litellm import LiteLLMOCRLoader

LiteLLMOCRLoader loads documents via a LiteLLM proxy’s OCR endpoint (e.g. Azure Document Intelligence). The proxy handles all provider-specific authentication and configuration.

from langchain_litellm import LiteLLMOCRLoader

loader = LiteLLMOCRLoader(
    proxy_base_url="http://localhost:4000",
    api_key="my-bearer-token",
    url_path="https://example.com/document.pdf",
    model="azure-document",
    mode="page",  # "page" = one Document per page; "single" = concatenate all pages
)
documents = loader.load()

Async loading is also supported:

documents = await loader.aload()

API reference

For detailed documentation of all classes and configurations, see the langchain-litellm API reference.

Class	Description
ChatLiteLLM	LangChain chat model wrapper for LiteLLM
ChatLiteLLMRouter	Router-backed chat model for load balancing and fallbacks
LiteLLMEmbeddings	Embed text across 100+ providers with a single consistent interface
LiteLLMEmbeddingsRouter	Router-backed embeddings for load balancing across deployments
LiteLLMOCRLoader	Document loader via LiteLLM proxy’s OCR endpoint

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

Popular Providers

Integrations by component

Installation and setup

Chat models

Embeddings

Document loaders

API reference

​Installation and setup

​Chat models

​Embeddings

​Document loaders

​API reference

Installation and setup

Chat models

Embeddings

Document loaders

API reference