Skip to main content
Doctran is a python package. It uses LLMs and open-source NLP libraries to transform raw text into clean, structured, information-dense documents that are optimized for vector space retrieval. You can think of Doctran as a black box where messy strings go in and nice, clean, labelled strings come out.

Installation and setup

pip install doctran

Document transformers

Document interrogator

See a usage example for DoctranQATransformer.
from langchain_community.document_transformers import DoctranQATransformer

Property extractor

See a usage example for DoctranPropertyExtractor.
from langchain_community.document_transformers import DoctranPropertyExtractor

Document translator

See a usage example for DoctranTextTranslator.
from langchain_community.document_transformers import DoctranTextTranslator

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.