Michele Dolfi

Results 172 comments of Michele Dolfi

@CSFelix regarding see the issue on Windows 11, do you mind trying some of the following? 1. Running the same example/tests on Linux? in a local container could be enough...

@cypriendubois could you please share some of the file which are failing?

maybe something like this could be enough. ```sh $ zipinfo tests/data/docx/lorem_ipsum.docx Archive: tests/data/docx/lorem_ipsum.docx Zip file size: 14817 bytes, number of entries: 11 -rw---- 4.5 fat 1312 b- defS 80-Jan-01 00:00...

Thanks, this helps indeed. I think the issue is related to #802. From your output, it seems the file has `[Content_Types].xml` at the end of the archive, which is not...

Are you using the CLI? I'm wondering if what you see could be solved by this fresh new PR https://github.com/DS4SD/docling/pull/214

The CLI options for OCR have been released in https://github.com/DS4SD/docling/releases/tag/v2.6.0

Thanks all for collaborating on this issue. I think we can settled it as closed.

Can you please clarify if the feature request is for input or output formats?

Should be fixed since https://github.com/docling-project/docling/pull/1791

It turns out also the `filetype` library is loading only 8K bytes [ref](https://github.com/h2non/filetype.py/blob/master/filetype/utils.py#L45), so this happens also in file inputs.