mlc-llm [Model Request] SmolDocling-256M-preview

[Model Request] SmolDocling-256M-preview

Open temsa opened this issue 7 months ago • 0 comments

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): SmolDocling-256M-preview
Is this model architecture supported by MLC-LLM? (the list of supported models) No ( Based on Idefics3 )

Additional context

This model was presented in the paper SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion.

🚀 Features:

🏷️ DocTags for Efficient Tokenization – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with DoclingDocuments.

🔍 OCR (Optical Character Recognition) – Extracts text accurately from images.

📐 Layout and Localization – Preserves document structure and document element bounding boxes.

💻 Code Recognition – Detects and formats code blocks including identation.

🔢 Formula Recognition – Identifies and processes mathematical expressions.

📊 Chart Recognition – Extracts and interprets chart data.

📑 Table Recognition – Supports column and row headers for structured table extraction.

🖼️ Figure Classification – Differentiates figures and graphical elements.

📝 Caption Correspondence – Links captions to relevant images and figures.

📜 List Grouping – Organizes and structures list elements correctly.

📄 Full-Page Conversion – Processes entire pages for comprehensive document conversion including all page elements (code, equations, tables, charts etc.)

🔲 OCR with Bounding Boxes – OCR regions using a bounding box.

📂 General Document Processing – Trained for both scientific and non-scientific documents.

🔄 Seamless Docling Integration – Import into Docling and export in multiple formats.

💨 Fast inference using VLLM – Avg of 0.35 secs per page on A100 GPU.

Apr 01 '25 09:04 temsa

mlc-llm mlc-llm copied to clipboard

[Model Request] SmolDocling-256M-preview

⚙️ Request New Models

Additional context

mlc-llm
mlc-llm copied to clipboard