Andrew Baumann
Andrew Baumann
It's possible, I haven't had the reason to do it myself yet, but a PR to add such functionality would be welcome, especially the extraction part, i.e. getting the colour...
This is an issue in the pdfminer library. I confirmed that: * pdfminer's pdf2txt.py tool fails in a similar way -- no spaces and far too many chars extracted *...
Thanks for the report and sample PDF. I've futzed with the hit detection algorithm quite a few times before, but haven't had any reports of issues with it for a...
Same issue here. It doesn't appear to be a memory leak, just something burning a core (busy-waiting?) for lengthy multi-second delays before servicing the UI. I don't recall seeing anything...
I tried some prior versions: 4.4.8 - repro 4.4.7 - repro 4.4.0 - repro 4.3.2 - no repro
Closing stale PR.
IMO this is more important now that Win11's `Win+X A` launches Windows Terminal in admin mode and not PowerShell/CMD. At a minimum, maybe we could just have a different `defaultProfile`...
Thanks for the suggestion! Do you have an example of a PDF with such annotations?
Capturing text before/after an annotation is implemented in the code as "context", but is currently used only for strikeout annotations. My expectation was that anyone adding a comment on a...
> I also have checked the `pdfminer` module. It said if we want to extract all of the text. We could do: > > ```python > from pdfminer.high_level import extract_pages...