docling
docling copied to clipboard
Convert model weights to safetensors format
We want to move from pickled objects saved by torch or torch.jit to safetensors format for the weights of docling-ibm-models. This has various advantages, such as better security, and also acts as a pre-requisite to achieve proper accelerator support across all models.
There is work-in-progess on this PR: https://github.com/DS4SD/docling-ibm-models/pull/50
Looking forward to this one! I'm working on a txtai integration for docling and the biggest downside is speed. For some PDFs that are a couple pages it takes 14s to extract vs 200ms with existing methods (Apache Tika). Obviously, the upside is all the formatting being preserved. But if 14s could go down to a couple seconds even it would be a big win.
Docling v2.12.0 has its models in safetensors format: https://github.com/DS4SD/docling/releases/tag/v2.12.0