James Healy

Results 139 comments of James Healy

Given the changes starts in 2.1.0, I'd guess it might be a result of this commit a8ca5dc

In my experience pdf-reader does a reasonable (but not perfect) text extraction from the majority of PDFs, but it does depend on the source files. For the 50% where it...

Hi @scottybigo. I downloaded all three files and tested text extraction with pdf-reader like this: ``` $ ruby -Ilib bin/pdf_text ~/downloads/4500067854.pdf $ ruby -Ilib bin/pdf_text ~/downloads/23781.pdf $ ruby -Ilib bin/pdf_text...

I believe pdf-reader will provide access to the tagged data, but it's pretty low level. For example, the high-ish level `Page#text` method ignore tags, but the low-level `Page#walk_contents` method should...

> Is this a bug in ImageMagick or in pdf-reader? It seems like an encoding issue. Unfortunately it's hard to say without looking at the PDF. Are you able to...

On the trailing Null character: I think I'm inclined to leave it in. I can see in the PDFs that the null character is included and unlike in C there's...

It'd be interesting to know if this is still an issue in v2.9.0 - there's been a number of fixes to glyph positioning calculations in the last few versions. If...

pdf-reader is capable of detecting all fonts in PDF, but that example isn't as robust as it could be and will need expanding for most real-world systems. When you run...

Thanks for the report. To understand the cause I'd really have to see the problem PDF. Are you able to share it with me via email ([email protected]'d.au)? On 18/01/2013 6:18...

Damn you autocorrect. My address is [email protected] On 20/01/2013 12:57 AM, "Sands Fish" [email protected] wrote: > James, does your email address have a single-quote character in it? > Doesn't like...