OpenDataLab

Results 2 repositories owned by OpenDataLab

MinerU

50.7k
Stars
4.2k
Forks
50.7k
Watchers

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

MinerU-HTML

154
Stars
18
Forks
154
Watchers

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.