James Healy

Results 139 comments of James Healy

Here is the output I get when running `pdf_text` from pdf-reader 2.0.0 with the PDF you linked: [100.txt](https://github.com/yob/pdf-reader/files/819806/100.txt) Do you get something similiar? Can you help me understand what the...

Thanks for the clear bug report. That particular `raise` was added between v2.9.2 and v2.10.0, so this sounds like a bug and I suspect your fix is what we need....

Thanks for a great sample file that demonstrates the issue. > I am wondering is this something that pdf-reader is intended to do accurately? I would classify it as a...

This is likely to be the fault of the primitive algorithm in PageLayout. I'd love to find time to improve it! The algorithm sometimes results in characters that will overlap,...

Issue one seems to have been resolved - I can't reproduce it on the latest release (v`2.2.1`). Issue two will be harder to address in a consistent way. In this...

Hi, Thanks for the suggestion. In your sample PDF are the bullets text characters that you can manually copy paste? Philosophically, pdf-reader aims to expose the data in the file...

I'd be more than happy to see a convenience method for named destinations added. I probably don't have time to add it myself, but I'm happy to review a PR.

Thanks for offering the contribute! The implementation in pypdf shows some helpful clues: https://github.com/mstamy2/PyPDF2/blob/18a2627adac13124d4122c8b92aaa863ccfb8c29/PyPDF2/pdf.py#L1350-L1389 By coincidence, this spec file in the pdf-reader repo has some named destinations: `spec/data/pdflatex.pdf`. This code...

> I started to implement this great! > the pypdf method retrieves all named destinations. So shouldn't named_destinations be a method of Reader? Yes. I'm not fully across named destinations,...

We're not intentionally skipping sueprscript, but depending on how they're encoded there's a few reasons why they might be missing from the output. The mostly likely is that pdf-reader's naive...