James Healy comments

Results 139 comments of


                                            James Healy

extracting arabic characters

Here is the output I get when running `pdf_text` from pdf-reader 2.0.0 with the PDF you linked: [100.txt](https://github.com/yob/pdf-reader/files/819806/100.txt) Do you get something similiar? Can you help me understand what the...

PDF::Reader::MalformedPDFError - after update to v2.10.0

Thanks for the clear bug report. That particular `raise` was added between v2.9.2 and v2.10.0, so this sounds like a bug and I suspect your fix is what we need....

Page.text fails when font size changes on a single line

Thanks for a great sample file that demonstrates the issue. > I am wondering is this something that pdf-reader is intended to do accurately? I would classify it as a...

crop text in 'Tj' PagesStrategy::OPERATORS

This is likely to be the fault of the primitive algorithm in PageLayout. I'd love to find time to improve it! The algorithm sometimes results in characters that will overlap,...

This gem is not able to extract the line near pdf page break

Issue one seems to have been resolved - I can't reproduce it on the latest release (v`2.2.1`). Issue two will be harder to address in a consistent way. In this...

Get Bullet Style in PDF

Hi, Thanks for the suggestion. In your sample PDF are the bullets text characters that you can manually copy paste? Philosophically, pdf-reader aims to expose the data in the file...

convenience methods - eg. extract named destinations

I'd be more than happy to see a convenience method for named destinations added. I probably don't have time to add it myself, but I'm happy to review a PR.

convenience methods - eg. extract named destinations

Thanks for offering the contribute! The implementation in pypdf shows some helpful clues: https://github.com/mstamy2/PyPDF2/blob/18a2627adac13124d4122c8b92aaa863ccfb8c29/PyPDF2/pdf.py#L1350-L1389 By coincidence, this spec file in the pdf-reader repo has some named destinations: `spec/data/pdflatex.pdf`. This code...

convenience methods - eg. extract named destinations

> I started to implement this great! > the pypdf method retrieves all named destinations. So shouldn't named_destinations be a method of Reader? Yes. I'm not fully across named destinations,...

Superscript words not being returned.

We're not intentionally skipping sueprscript, but depending on how they're encoded there's a few reasons why they might be missing from the output. The mostly likely is that pdf-reader's naive...