pypdf icon indicating copy to clipboard operation
pypdf copied to clipboard

Issue in text extraction (spaces)

Open pubpub-zz opened this issue 1 year ago • 0 comments

Issue with text extraction (spacing)

Environment

Which environment were you using when you encountered the problem? windows 10

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.7.0

Code + PDF

import PyPDF2;PyPDF2.PdfFileReader(open('c:/file-0.pdf', 'rb')).pages[3].extract_text()

file-0.pdf

result from text extraction (beginning only)

APPROVEDShortlyaftertheGenevaBOFsession,thewww-vrmlmailinglistwascreatedtodiscuss\nthedevelopmentofaspecificationforthefirstversionofVRML.Theresponsetothelist

other case (space dissaperaring???) import PyPDF2;PyPDF2.PdfFileReader(open('c:/2017.pdf', 'rb')).pages[0].extract_text()

2017年年度报告.pdf

observed on the footer( 2018 年04 月)

pubpub-zz avatar Jul 23 '22 14:07 pubpub-zz