Michele Dolfi
Michele Dolfi
Thanks for the insights. At the moment we support only RapidOCR with ONNXRuntime, so, as suggested, it could be best to mention it in the docs. It looks like RapidOCR...
The error reported above was fixed in https://github.com/DS4SD/docling/pull/1024 (released in version `2.23.1`). I'm closing this issue, new problems should be filed as a new issue.
We would kindly ask you to decouple the two issues reported here: 1. Issue with a filename containing Chinese characters (on Windows) 2. Some part of the document not detected...
We found this issue is related to filenames with unicode characters when using of Windows. The fix actually has to be done in the docling-parse library and validated with the...
Well, now we fixed it also in the `docling-parse` backend, so you don't have to change anything. If you want to try different backends, here are a few examples https://docling-project.github.io/docling/examples/custom_convert/
This feature is not planned any time soon. But is could be added as an enrichment pipeline step.
At the moment the `hash` column contains the hash of the actual `contents` column. This is the JSON representation of the output, which has the property `file-info.filename`, so different filenames...
Reading again above, there were some open questions about which field to expose and with which names. The fact of exposing both is for sure a good idea, since they...
Should be fixed in https://github.com/IBM/data-prep-kit/pull/756.
Should be fixed in https://github.com/IBM/data-prep-kit/pull/756.