Andrew Baumann comments

Results 27 comments of


                                            Andrew Baumann

Feature Request: Differentiate Extracted Highlights by color

It's possible, I haven't had the reason to do it myself yet, but a PR to add such functionality would be welcome, especially the extraction part, i.e. getting the colour...

Scan with OCR: words not split

This is an issue in the pdfminer library. I confirmed that: * pdfminer's pdf2txt.py tool fails in a similar way -- no spaces and far too many chars extracted *...

PDF example of truncated highlight

Thanks for the report and sample PDF. I've futzed with the hit detection algorithm quite a few times before, but haven't had any reports of issues with it for a...

[Bug]: Very slow and non usable using Windows 11

Same issue here. It doesn't appear to be a memory leak, just something burning a core (busy-waiting?) for lengthy multi-second delays before servicing the UI. I don't recall seeing anything...

[Bug]: Very slow and non usable using Windows 11

I tried some prior versions: 4.4.8 - repro 4.4.7 - repro 4.4.0 - repro 4.3.2 - no repro

Various new printers (json, jsonl, csv, and todo)

Closing stale PR.

Allow for custom profiles when running as admin vs user

IMO this is more important now that Win11's `Win+X A` launches Windows Terminal in admin mode and not PowerShell/CMD. At a minimum, maybe we could just have a different `defaultProfile`...

Support "Caret" annotation

Thanks for the suggestion! Do you have an example of a PDF with such annotations?

Feature: Outputting an annotation and the entire sentence where the annotation is located

Capturing text before/after an annotation is implemented in the code as "context", but is currently used only for strikeout annotations. My expectation was that anyone adding a comment on a...

Feature: Outputting an annotation and the entire sentence where the annotation is located

> I also have checked the `pdfminer` module. It said if we want to extract all of the text. We could do: > > ```python > from pdfminer.high_level import extract_pages...