docling icon indicating copy to clipboard operation
docling copied to clipboard

Convert model weights to safetensors format

Open cau-git opened this issue 1 year ago • 2 comments

We want to move from pickled objects saved by torch or torch.jit to safetensors format for the weights of docling-ibm-models. This has various advantages, such as better security, and also acts as a pre-requisite to achieve proper accelerator support across all models.

cau-git avatar Nov 11 '24 13:11 cau-git

There is work-in-progess on this PR: https://github.com/DS4SD/docling-ibm-models/pull/50

cau-git avatar Nov 25 '24 09:11 cau-git

Looking forward to this one! I'm working on a txtai integration for docling and the biggest downside is speed. For some PDFs that are a couple pages it takes 14s to extract vs 200ms with existing methods (Apache Tika). Obviously, the upside is all the formatting being preserved. But if 14s could go down to a couple seconds even it would be a big win.

davidmezzetti avatar Dec 03 '24 11:12 davidmezzetti

Docling v2.12.0 has its models in safetensors format: https://github.com/DS4SD/docling/releases/tag/v2.12.0

nikos-livathinos avatar Dec 15 '24 16:12 nikos-livathinos