Tim Allison
Tim Allison
>Another difference is that files sent to Tika with compression will have a different Content-Type returned (ie, from 'application/pdf' to ['application/gzip', 'application/pdf']) If I understand correctly, if I curl with...
[jxl.zip](https://github.com/drewnoakes/metadata-extractor/files/7260684/jxl.zip)
The larger two files were contributed by Tyler. The smallest file, I made with: https://github.com/surma/jxl-art/blob/main/LICENSE
It looks like this data is in the `udta` box: �meta "hdlr mdirappl � �ilst "�nam data Test Title cpil data pgap data tmpo data '�too...
Raw bytes for the userdata box. [apple-payload.bin.zip](https://github.com/drewnoakes/metadata-extractor/files/6640806/apple-payload.bin.zip) Our unit test file for the metadata above: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-audiovideo-module/src/test/resources/test-documents/testMP4.m4a
I did some ugly hackery: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-audiovideo-module/src/main/java/org/apache/tika/parser/mp4/TikaMp4BoxHandler.java and: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-audiovideo-module/src/main/java/org/apache/tika/parser/mp4/boxes/TikaUserDataBox.java This now works for Tika. If there's a clean way to add these changes back to metadata-extractor, please let me know. Many...
related? issue #542
Will take a look at this after the 1.15 release...next week or so...sorry. This looks very cool. Thank you.
Thank you for opening this. Would you be able to break this into 2 separate pull requests: one for the PDFParser modfications, and one for the mods to tika-app's gui....
> On the PDFParser mods, is there any way to make the syntax similar to what we get from Tesseract's hocr setting @epugh would this be of use to you?...