Tim Allison
Tim Allison
Hi Hernan, Any chance you could share a test file? We'd want to fix this over on Apache Tika as well. Thank you!
@jiangzif , would you mind if we added your file as a test case to Apache Tika for [TIKA-1513](https://issues.apache.org/jira/browse/TIKA-1513)?
[6fb9cceb33a2bf98749e895e43840e90f4bcbf4a631b512c010675e3763f5433.pdf](https://github.com/trailofbits/polyfile/files/3871096/6fb9cceb33a2bf98749e895e43840e90f4bcbf4a631b512c010675e3763f5433.pdf)
We made integration with other ocr engines much easier in 2.x. The new feature is entirely undocumented. Ping me if you want help with this.
> Hey @tballison - Hope you are good my friend. I've release 1.28.5 using the standard Jammy. Was there a reason why you were thinking pinning to a particular version?...
Good enough for now. Thank you @dameikle !
I _think_ there was a `bytes` before...at least according to the source code. The problem is that there's a regex that expects the `xml:space...` before the `bytes` element. This just...
Closing because of merge of #558
I'm also happy to rerun with a more recent version of pdfcpu if you'd like.
This was the sql I ran to rm object #s from stderrs so that we could get slightly more meaningful counts. `select regexp_replace(stderr, '\(obj#:?\d+\)', '') as stderr_cleaned, count(1) as cnt...