Jeremy Singer-Vine comments

Results 105 comments of


Jeremy Singer-Vine

Explicitly typecast `fontname` and `text` fields to str for char objects

Ended up implementing a similar, but slightly different version of this, based on some research about the common bytes-typed fontnames we saw cropping up in issues/discussions: https://github.com/jsvine/pdfplumber/pull/862/commits/9441ff7628fff9f69d81c6afd8ef439bf101b254 Thank you for...

I have a new problem

Closing, as this issue seems to have been resolved.

extract_text() returns a unicode character \ufb03 LATIN SMALL LIGATURE FFI instead of the letters ffi when it comes across the word Office,

Hi @jeffkile, and thanks. For reference's sake, here's what I think you're pointing to: ```python import pdfplumber pdf = pdfplumber.open("CK12_Earth_Science_rev.pdf") page = pdf.pages[8] print(page.extract_text()) ``` ... produces: ``` Chapter 1...

extract_text() returns a unicode character \ufb03 LATIN SMALL LIGATURE FFI instead of the letters ffi when it comes across the word Office,

With v0.9.0, `pdfplumber`'s text-extraction methods now expand the most common Latin-alphabet ligatures into their constituent characters. (It does not do so for ligatures that are considered to be their own...

extract_text() returns a unicode character \ufb03 LATIN SMALL LIGATURE FFI instead of the letters ffi when it comes across the word Office,

@Tom-Hudson Thank you for flagging. Can you share the PDF? (Hard to diagnose the issue without it. If it's a document you don't want to share publicly, you can email...

Jeremy Singer-Vine

Explicitly typecast `fontname` and `text` fields to str for char objects

I have a new problem

extract_text() returns a unicode character \ufb03 LATIN SMALL LIGATURE FFI instead of the letters ffi when it comes across the word Office,

extract_text() returns a unicode character \ufb03 LATIN SMALL LIGATURE FFI instead of the letters ffi when it comes across the word Office,

extract_text() returns a unicode character \ufb03 LATIN SMALL LIGATURE FFI instead of the letters ffi when it comes across the word Office,

Memory issues on very large PDFs

Memory issues on very large PDFs

INFO:waybackpack.session: HTTP status code: 302

INFO:waybackpack.session: HTTP status code: 302

Parse data from NY's historical PDFs