pdfminer
pdfminer copied to clipboard
html or xml converter fails with TypeError: write() argument must be str, not bytes
python pdf2txt.py -t xml -o output.xml -d %pdffilepath% fails with following error
Traceback (most recent call last):
File "pdf2text.py", line 113, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "pdf2text.py", line 92, in main
device = XMLConverter(rsrcmgr, outfp, laparams=laparams, imagewriter=imagewriter, stripcontrol=stripcontrol)
File "C:\apps\Python37\lib\site-packages\pdfminer\converter.py", line 442, in __init__
self.write_header()
File "C:\apps\Python37\lib\site-packages\pdfminer\converter.py", line 453, in write_header
self.write('<?xml version="1.0" encoding="%s" ?>\n' % self.codec)
File "C:\apps\Python37\lib\site-packages\pdfminer\converter.py", line 448, in write
self.outfp.write(text)
TypeError: write() argument must be str, not bytes
Likely related - dumppdf.py
also fails similarly:
Traceback (most recent call last):
File "/home/wynand/.virtualenvs/pdf/bin/dumppdf.py", line 272, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "/home/wynand/.virtualenvs/pdf/bin/dumppdf.py", line 269, in main
dumpall=dumpall, mode=mode, extractdir=extractdir)
File "/home/wynand/.virtualenvs/pdf/bin/dumppdf.py", line 222, in dumppdf
dumptrailers(outfp, doc)
File "/home/wynand/.virtualenvs/pdf/bin/dumppdf.py", line 95, in dumptrailers
out.write('<trailer>\n')
TypeError: a bytes-like object is required, not 'str'
yes I also observed the same for dump as well
Traceback (most recent call last):
File "dumppdf.py", line 272, in <module>
if __name__ == '__main__': sys.exit(main(sys.argv))
File "dumppdf.py", line 269, in main
dumpall=dumpall, mode=mode, extractdir=extractdir)
File "dumppdf.py", line 222, in dumppdf
dumptrailers(outfp, doc)
File "dumppdf.py", line 95, in dumptrailers
out.write('<trailer>\n')
TypeError: a bytes-like object is required, not 'str'
Fails with the sample PDFs provided by the repo so it's not an issue with our files then.
pdfminer.six
works, going to use that for now.
yes pdfminer.six is working for me as well but it gives me
I am expecting it should give me text node for each word or each line
in pdfminer.six it does not maintain the sequence of text from pdf flie for text and xml
Any update here?