scraperwiki-python icon indicating copy to clipboard operation
scraperwiki-python copied to clipboard

pdftoxml in utils.py is not portable to Windows.

Open StevenMaude opened this issue 11 years ago • 3 comments

  1. The /dev/null needs to be NUL on Windows.
  2. NamedTemporaryFile behaves differently in Windows to Unix.

StevenMaude avatar May 16 '14 09:05 StevenMaude

Is this still true?

I am trying to convert pdf to xml on a Windows machine , Python 3 and I am getting an error on the "return xmldata.decode('utf-8')"

Please let me know.

aparna06 avatar Mar 16 '17 15:03 aparna06

This is still the case as no-one's changed the code there.

You could:

  • just run pdftohtml.exe separately and dump the results to a file (either via a script, or via Python subprocess or however you like)
  • or you can try using this as a starting point for replacing the code in this package.

StevenMaude avatar Mar 16 '17 16:03 StevenMaude

Thank you. I shall try both.

aparna06 avatar Mar 16 '17 16:03 aparna06