Skip to main content
Document loaders provide a standard interface for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain’s Document format. This ensures that data can be handled consistently regardless of the source. All document loaders implement the BaseLoader interface.

Interface

Each document loader may define its own parameters, but they share a common API:
  • .load(): Loads all documents at once.
  • .loadAndSplit(): Loads all documents at once and splits them into smaller documents.
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";

const loader = new CSVLoader(
  ...  // <-- Integration specific parameters here
);
const data = await loader.load();

By category

LangChain.js categorizes document loaders in two different ways:
  • File loaders, which load data into LangChain formats from your local filesystem.
  • Web loaders, which load data from remote sources.

File loaders

If you’d like to contribute an integration, see Contributing integrations.

PDFs

Document LoaderDescriptionPackage/API
PDFLoaderLoad and parse PDF files using pdf-parsePackage

Common File Types

Document LoaderDescriptionPackage/API
CSVLoad data from CSV files with configurable column extractionPackage
JSONLoad JSON files using JSON pointer to target specific keysPackage
JSONLinesLoad data from JSONLines/JSONL filesPackage
TextLoad plain text filesPackage
DOCXLoad Microsoft Word documents (.docx and .doc formats)Package
EPUBLoad EPUB files with optional chapter splittingPackage
PPTXLoad PowerPoint presentationsPackage
SubtitlesLoad subtitle files (.srt format)Package

Specialized File Loaders

Document LoaderDescriptionPackage/API
DirectoryLoaderLoad all files from a directory with custom loader mappingsPackage
UnstructuredLoaderLoad multiple file types using Unstructured APIAPI
MultiFileLoaderLoad data from multiple individual file pathsPackage
ChatGPTLoad ChatGPT conversation exportsPackage
Notion MarkdownLoad Notion pages exported as MarkdownPackage
OpenAI Whisper AudioTranscribe audio files using OpenAI Whisper APIAPI

Web loaders

Webpages

Document LoaderDescriptionWeb SupportPackage/API
CheerioLoad webpages using Cheerio (lightweight, no JavaScript execution)βœ…Package
PlaywrightLoad dynamic webpages using Playwright (supports JavaScript rendering)❌Package
PuppeteerLoad dynamic webpages using Puppeteer (headless Chrome)❌Package
FireCrawlCrawl and convert websites into LLM-ready markdownβœ…API
SpiderFast crawler that converts websites into HTML, markdown, or textβœ…API
RecursiveUrlLoaderRecursively load webpages following links❌Package
SitemapLoad all pages from a sitemap.xmlβœ…Package
BrowserbaseLoad webpages using managed headless browsers with stealth modeβœ…API
WebPDFLoaderLoad PDF files in web environmentsβœ…Package

Cloud Providers

Document LoaderDescriptionWeb SupportPackage/API
S3Load files from AWS S3 buckets❌Package
Azure Blob Storage ContainerLoad all files from Azure Blob Storage container❌Package
Azure Blob Storage FileLoad individual files from Azure Blob Storage❌Package
Google Cloud StorageLoad files from Google Cloud Storage buckets❌Package
Google Cloud SQL for PostgreSQLLoad documents from Cloud SQL PostgreSQL databasesβœ…Package

Productivity Tools

Document LoaderDescriptionWeb SupportPackage/API
Notion APILoad Notion pages and databases via APIβœ…API
FigmaLoad Figma file dataβœ…API
ConfluenceLoad pages from Confluence spaces❌API
GitHubLoad files from GitHub repositoriesβœ…API
GitBookLoad GitBook documentation pagesβœ…Package
JiraLoad issues from Jira projects❌API
AirtableLoad records from Airtable basesβœ…API
TaskadeLoad Taskade project dataβœ…API

Search & Data APIs

Document LoaderDescriptionWeb SupportPackage/API
SearchAPILoad web search results from SearchAPI (Google, YouTube, etc.)βœ…API
SerpAPILoad web search results from SerpAPIβœ…API
Apify DatasetLoad scraped data from Apify platformβœ…API

Audio & Video

Document LoaderDescriptionWeb SupportPackage/API
YouTubeLoad YouTube video transcriptsβœ…Package
AssemblyAITranscribe audio and video files using AssemblyAI APIβœ…API
SonixTranscribe audio files using Sonix API❌API

Other

Document LoaderDescriptionWeb SupportPackage/API
CouchbaseLoad documents from Couchbase database using SQL++ queriesβœ…Package
LangSmithLoad datasets and traces from LangSmithβœ…API
Hacker NewsLoad Hacker News threads and commentsβœ…Package
IMSDBLoad movie scripts from Internet Movie Script Databaseβœ…Package
College ConfidentialLoad college information from College Confidentialβœ…Package
Blockchain DataLoad blockchain data (NFTs, transactions) via Sort.xyz APIβœ…API

All document loaders


Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.