Tim Allison comments

Results 93 comments of


                                            Tim Allison

Problem reading Float

Hi Hernan, Any chance you could share a test file? We'd want to fix this over on Apache Tika as well. Thank you!

MetaData Can not recognize Chinese

@jiangzif , would you mind if we added your file as a test case to Apache Tika for [TIKA-1513](https://issues.apache.org/jira/browse/TIKA-1513)?

UnboundLocalError: local variable 'is_dct_decode' referenced before assignment

[6fb9cceb33a2bf98749e895e43840e90f4bcbf4a631b512c010675e3763f5433.pdf](https://github.com/trailofbits/polyfile/files/3871096/6fb9cceb33a2bf98749e895e43840e90f4bcbf4a631b512c010675e3763f5433.pdf)

Interface ABBYY FineReader OCR with fscrawler

We made integration with other ocr engines much easier in 2.x. The new feature is entirely undocumented. Ping me if you want help with this.

update to jammy

> Hey @tballison - Hope you are good my friend. I've release 1.28.5 using the standard Jammy. Was there a reason why you were thinking pinning to a particular version?...

update to jammy

Good enough for now. Thank you @dameikle !

Dump now contain a `bytes` attribute which breaks parsing.

I _think_ there was a `bytes` before...at least according to the source code. The problem is that there's a regex that expects the `xml:space...` before the `bytes` element. This just...

TIKA-1735 - add AC1027 and AC1032 and add ability to use dwgread if it is installed.

Closing because of merge of #558

common crawl: stderrs on 8 million files

I'm also happy to rerun with a more recent version of pdfcpu if you'd like.

common crawl: stderrs on 8 million files

This was the sql I ran to rm object #s from stderrs so that we could get slightly more meaningful counts. `select regexp_replace(stderr, '\(obj#:?\d+\)', '') as stderr_cleaned, count(1) as cnt...