OpenDataLab
Results
2
repositories owned by
OpenDataLab
MinerU
50.7k
Stars
4.2k
Forks
50.7k
Watchers
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
MinerU-HTML
154
Stars
18
Forks
154
Watchers
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.