Jeremy Singer-Vine
Jeremy Singer-Vine
Ended up implementing a similar, but slightly different version of this, based on some research about the common bytes-typed fontnames we saw cropping up in issues/discussions: https://github.com/jsvine/pdfplumber/pull/862/commits/9441ff7628fff9f69d81c6afd8ef439bf101b254 Thank you for...
Closing, as this issue seems to have been resolved.
Hi @jeffkile, and thanks. For reference's sake, here's what I think you're pointing to: ```python import pdfplumber pdf = pdfplumber.open("CK12_Earth_Science_rev.pdf") page = pdf.pages[8] print(page.extract_text()) ``` ... produces: ``` Chapter 1...
With v0.9.0, `pdfplumber`'s text-extraction methods now expand the most common Latin-alphabet ligatures into their constituent characters. (It does not do so for ligatures that are considered to be their own...
@Tom-Hudson Thank you for flagging. Can you share the PDF? (Hard to diagnose the issue without it. If it's a document you don't want to share publicly, you can email...
Thanks for flagging @AnuraagKhare. Can you share a PDF that, when processed repeatedly, reproduces the issue? (Or all 60 files, but I figured 1 will be simpler.)
@xsank `.flush_cache()` refers, specifically, to objects that `pdfplumber` *itself* has explicitly cached. Unfortunately, I haven't found a way to free up the memory that `pdfminer.six` is allocating.
Hello. Try using the `--follow-redirects` command-line option. Does that resolve your issue? (For all options, see this project’s README.md and/or run `waybackpack -h`.)
Thanks for flagging, @jwilk — very interesting. It seems that *from the perspective of the Wayback Machine*, these are different resources. A bit frustrating that they don't do any internal...
Added a commit addressing your (very reasonable) requests 👍