mlc-llm
mlc-llm copied to clipboard
[Model Request] SmolDocling-256M-preview
⚙️ Request New Models
- Link to an existing implementation (e.g. Hugging Face/Github): SmolDocling-256M-preview
- Is this model architecture supported by MLC-LLM? (the list of supported models) No ( Based on Idefics3 )
Additional context
This model was presented in the paper SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion.
🚀 Features:
- 🏷️ DocTags for Efficient Tokenization – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with DoclingDocuments.
- 🔍 OCR (Optical Character Recognition) – Extracts text accurately from images.
- 📐 Layout and Localization – Preserves document structure and document element bounding boxes.
- 💻 Code Recognition – Detects and formats code blocks including identation.
- 🔢 Formula Recognition – Identifies and processes mathematical expressions.
- 📊 Chart Recognition – Extracts and interprets chart data.
- 📑 Table Recognition – Supports column and row headers for structured table extraction.
- 🖼️ Figure Classification – Differentiates figures and graphical elements.
- 📝 Caption Correspondence – Links captions to relevant images and figures.
- 📜 List Grouping – Organizes and structures list elements correctly.
- 📄 Full-Page Conversion – Processes entire pages for comprehensive document conversion including all page elements (code, equations, tables, charts etc.)
- 🔲 OCR with Bounding Boxes – OCR regions using a bounding box.
- 📂 General Document Processing – Trained for both scientific and non-scientific documents.
- 🔄 Seamless Docling Integration – Import into Docling and export in multiple formats.
- 💨 Fast inference using VLLM – Avg of 0.35 secs per page on A100 GPU.