Tim Allison

Results 93 comments of Tim Allison

Hi Hernan, Any chance you could share a test file? We'd want to fix this over on Apache Tika as well. Thank you!

@jiangzif , would you mind if we added your file as a test case to Apache Tika for [TIKA-1513](https://issues.apache.org/jira/browse/TIKA-1513)?

[6fb9cceb33a2bf98749e895e43840e90f4bcbf4a631b512c010675e3763f5433.pdf](https://github.com/trailofbits/polyfile/files/3871096/6fb9cceb33a2bf98749e895e43840e90f4bcbf4a631b512c010675e3763f5433.pdf)

We made integration with other ocr engines much easier in 2.x. The new feature is entirely undocumented. Ping me if you want help with this.

> Hey @tballison - Hope you are good my friend. I've release 1.28.5 using the standard Jammy. Was there a reason why you were thinking pinning to a particular version?...

Good enough for now. Thank you @dameikle !

I _think_ there was a `bytes` before...at least according to the source code. The problem is that there's a regex that expects the `xml:space...` before the `bytes` element. This just...

I'm also happy to rerun with a more recent version of pdfcpu if you'd like.

This was the sql I ran to rm object #s from stderrs so that we could get slightly more meaningful counts. `select regexp_replace(stderr, '\(obj#:?\d+\)', '') as stderr_cleaned, count(1) as cnt...