Jeremy Singer-Vine

Results 105 comments of Jeremy Singer-Vine

Ended up implementing a similar, but slightly different version of this, based on some research about the common bytes-typed fontnames we saw cropping up in issues/discussions: https://github.com/jsvine/pdfplumber/pull/862/commits/9441ff7628fff9f69d81c6afd8ef439bf101b254 Thank you for...

Closing, as this issue seems to have been resolved.

Hi @jeffkile, and thanks. For reference's sake, here's what I think you're pointing to: ```python import pdfplumber pdf = pdfplumber.open("CK12_Earth_Science_rev.pdf") page = pdf.pages[8] print(page.extract_text()) ``` ... produces: ``` Chapter 1...

With v0.9.0, `pdfplumber`'s text-extraction methods now expand the most common Latin-alphabet ligatures into their constituent characters. (It does not do so for ligatures that are considered to be their own...

@Tom-Hudson Thank you for flagging. Can you share the PDF? (Hard to diagnose the issue without it. If it's a document you don't want to share publicly, you can email...

Thanks for flagging @AnuraagKhare. Can you share a PDF that, when processed repeatedly, reproduces the issue? (Or all 60 files, but I figured 1 will be simpler.)

@xsank `.flush_cache()` refers, specifically, to objects that `pdfplumber` *itself* has explicitly cached. Unfortunately, I haven't found a way to free up the memory that `pdfminer.six` is allocating.

Hello. Try using the `--follow-redirects` command-line option. Does that resolve your issue? (For all options, see this project’s README.md and/or run `waybackpack -h`.)

Thanks for flagging, @jwilk — very interesting. It seems that *from the perspective of the Wayback Machine*, these are different resources. A bit frustrating that they don't do any internal...

Added a commit addressing your (very reasonable) requests 👍