scraperwiki-python
scraperwiki-python copied to clipboard
pdftoxml in utils.py is not portable to Windows.
- The
/dev/nullneeds to beNULon Windows. NamedTemporaryFilebehaves differently in Windows to Unix.
Is this still true?
I am trying to convert pdf to xml on a Windows machine , Python 3 and I am getting an error on the "return xmldata.decode('utf-8')"
Please let me know.
This is still the case as no-one's changed the code there.
You could:
- just run
pdftohtml.exeseparately and dump the results to a file (either via a script, or via Python subprocess or however you like) - or you can try using this as a starting point for replacing the code in this package.
Thank you. I shall try both.