Michele Dolfi
Michele Dolfi
@CSFelix regarding see the issue on Windows 11, do you mind trying some of the following? 1. Running the same example/tests on Linux? in a local container could be enough...
@cypriendubois could you please share some of the file which are failing?
maybe something like this could be enough. ```sh $ zipinfo tests/data/docx/lorem_ipsum.docx Archive: tests/data/docx/lorem_ipsum.docx Zip file size: 14817 bytes, number of entries: 11 -rw---- 4.5 fat 1312 b- defS 80-Jan-01 00:00...
Thanks, this helps indeed. I think the issue is related to #802. From your output, it seems the file has `[Content_Types].xml` at the end of the archive, which is not...
Are you using the CLI? I'm wondering if what you see could be solved by this fresh new PR https://github.com/DS4SD/docling/pull/214
The CLI options for OCR have been released in https://github.com/DS4SD/docling/releases/tag/v2.6.0
Thanks all for collaborating on this issue. I think we can settled it as closed.
Can you please clarify if the feature request is for input or output formats?
Should be fixed since https://github.com/docling-project/docling/pull/1791
It turns out also the `filetype` library is loading only 8K bytes [ref](https://github.com/h2non/filetype.py/blob/master/filetype/utils.py#L45), so this happens also in file inputs.