mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Model Request] SmolDocling-256M-preview

Open temsa opened this issue 7 months ago • 0 comments

⚙️ Request New Models

Additional context

This model was presented in the paper SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion.

🚀 Features:

  • 🏷️ DocTags for Efficient Tokenization – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with DoclingDocuments.
  • 🔍 OCR (Optical Character Recognition) – Extracts text accurately from images.
  • 📐 Layout and Localization – Preserves document structure and document element bounding boxes.
  • 💻 Code Recognition – Detects and formats code blocks including identation.
  • 🔢 Formula Recognition – Identifies and processes mathematical expressions.
  • 📊 Chart Recognition – Extracts and interprets chart data.
  • 📑 Table Recognition – Supports column and row headers for structured table extraction.
  • 🖼️ Figure Classification – Differentiates figures and graphical elements.
  • 📝 Caption Correspondence – Links captions to relevant images and figures.
  • 📜 List Grouping – Organizes and structures list elements correctly.
  • 📄 Full-Page Conversion – Processes entire pages for comprehensive document conversion including all page elements (code, equations, tables, charts etc.)
  • 🔲 OCR with Bounding Boxes – OCR regions using a bounding box.
  • 📂 General Document Processing – Trained for both scientific and non-scientific documents.
  • 🔄 Seamless Docling Integration – Import into Docling and export in multiple formats.
  • 💨 Fast inference using VLLM – Avg of 0.35 secs per page on A100 GPU.

temsa avatar Apr 01 '25 09:04 temsa