Interface
Each document loader may define its own parameters, but they share a common API:.load(): Loads all documents at once..loadAndSplit(): Loads all documents at once and splits them into smaller documents.
By category
LangChain.js categorizes document loaders in two different ways:- File loaders, which load data into LangChain formats from your local filesystem.
- Web loaders, which load data from remote sources.
File loaders
If youβd like to contribute an integration, see Contributing integrations.
PDFs
| Document Loader | Description | Package/API |
|---|---|---|
| PDFLoader | Load and parse PDF files using pdf-parse | Package |
Common File Types
| Document Loader | Description | Package/API |
|---|---|---|
| CSV | Load data from CSV files with configurable column extraction | Package |
| JSON | Load JSON files using JSON pointer to target specific keys | Package |
| JSONLines | Load data from JSONLines/JSONL files | Package |
| Text | Load plain text files | Package |
| DOCX | Load Microsoft Word documents (.docx and .doc formats) | Package |
| EPUB | Load EPUB files with optional chapter splitting | Package |
| PPTX | Load PowerPoint presentations | Package |
| Subtitles | Load subtitle files (.srt format) | Package |
Specialized File Loaders
| Document Loader | Description | Package/API |
|---|---|---|
| DirectoryLoader | Load all files from a directory with custom loader mappings | Package |
| UnstructuredLoader | Load multiple file types using Unstructured API | API |
| MultiFileLoader | Load data from multiple individual file paths | Package |
| ChatGPT | Load ChatGPT conversation exports | Package |
| Notion Markdown | Load Notion pages exported as Markdown | Package |
| OpenAI Whisper Audio | Transcribe audio files using OpenAI Whisper API | API |
Web loaders
Webpages
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Cheerio | Load webpages using Cheerio (lightweight, no JavaScript execution) | β | Package |
| Playwright | Load dynamic webpages using Playwright (supports JavaScript rendering) | β | Package |
| Puppeteer | Load dynamic webpages using Puppeteer (headless Chrome) | β | Package |
| FireCrawl | Crawl and convert websites into LLM-ready markdown | β | API |
| Spider | Fast crawler that converts websites into HTML, markdown, or text | β | API |
| RecursiveUrlLoader | Recursively load webpages following links | β | Package |
| Sitemap | Load all pages from a sitemap.xml | β | Package |
| Browserbase | Load webpages using managed headless browsers with stealth mode | β | API |
| WebPDFLoader | Load PDF files in web environments | β | Package |
Cloud Providers
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| S3 | Load files from AWS S3 buckets | β | Package |
| Azure Blob Storage Container | Load all files from Azure Blob Storage container | β | Package |
| Azure Blob Storage File | Load individual files from Azure Blob Storage | β | Package |
| Google Cloud Storage | Load files from Google Cloud Storage buckets | β | Package |
| Google Cloud SQL for PostgreSQL | Load documents from Cloud SQL PostgreSQL databases | β | Package |
Productivity Tools
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Notion API | Load Notion pages and databases via API | β | API |
| Figma | Load Figma file data | β | API |
| Confluence | Load pages from Confluence spaces | β | API |
| GitHub | Load files from GitHub repositories | β | API |
| GitBook | Load GitBook documentation pages | β | Package |
| Jira | Load issues from Jira projects | β | API |
| Airtable | Load records from Airtable bases | β | API |
| Taskade | Load Taskade project data | β | API |
Search & Data APIs
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| SearchAPI | Load web search results from SearchAPI (Google, YouTube, etc.) | β | API |
| SerpAPI | Load web search results from SerpAPI | β | API |
| Apify Dataset | Load scraped data from Apify platform | β | API |
Audio & Video
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| YouTube | Load YouTube video transcripts | β | Package |
| AssemblyAI | Transcribe audio and video files using AssemblyAI API | β | API |
| Sonix | Transcribe audio files using Sonix API | β | API |
Other
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Couchbase | Load documents from Couchbase database using SQL++ queries | β | Package |
| LangSmith | Load datasets and traces from LangSmith | β | API |
| Hacker News | Load Hacker News threads and comments | β | Package |
| IMSDB | Load movie scripts from Internet Movie Script Database | β | Package |
| College Confidential | Load college information from College Confidential | β | Package |
| Blockchain Data | Load blockchain data (NFTs, transactions) via Sort.xyz API | β | API |