email2pdf icon indicating copy to clipboard operation
email2pdf copied to clipboard

Upside-down text test failing incorrectly on Linux

Open andrewferrier opened this issue 10 years ago • 8 comments

FAIL: test_plaincontent_upsidedown (tests.test_Subprocess_Basic.TestBasic)

Traceback (most recent call last): File "/home/ferriera/gitco/github/email2pdf/tests/test_Subprocess_Basic.py", line 74, in test_plaincontent_upsidedown self.assertRegex(self.getPDFText(self.getTimedFilename()), "ɯɐɹƃoɹd ɟpdᄅlᴉɐɯǝ ǝɥʇ ɟo ʇsǝʇ ɐ sᴉ sᴉɥʇ ollǝH") AssertionError: Regex didn't match: 'ɯɐɹƃoɹd ɟpdᄅlᴉɐɯǝ ǝɥʇ ɟo ʇsǝʇ ɐ sᴉ sᴉɥʇ ollǝH' not found in 'ɯɐɹƃoɹd ɟpd lᴉɐɯǝ ǝɥʇ ɟo ʇsǝʇ ɐ sᴉ sᴉɥʇ ollǝH\n\n\x0c'

andrewferrier avatar Jan 12 '15 21:01 andrewferrier

Doesn't seem to work if you use /usr/local/bin/pdf2txt.py either; it may just be that pdfminer3k is broken in this respect. Worth trying pypdf2? http://stackoverflow.com/questions/15737806/extract-text-using-pdfminer-and-pypdf2-merges-columns (although that doesn't seem to extract any text at all).

andrewferrier avatar Jan 13 '15 14:01 andrewferrier

Fails on both portland and in Docker.

andrewferrier avatar Jan 18 '15 16:01 andrewferrier

Consider calling out to http://en.wikipedia.org/wiki/Pdftotext ? Is that supported on OS X?

andrewferrier avatar Jan 21 '15 08:01 andrewferrier

Experimenting with this issue in branch issue-58:

https://github.com/andrewferrier/email2pdf/tree/issue-58

andrewferrier avatar Feb 10 '15 16:02 andrewferrier

Another option is ebook-convert from Calibre: http://askubuntu.com/a/56400/728

andrewferrier avatar Dec 26 '15 19:12 andrewferrier

The basic issue is that the upside-down '2', namely ᄅ, is not being extracted correctly and is being replaced with a space (there is also some extra whitespace at the end of the extracted string). Here's the char in question: http://unicodelookup.com/#ᄅ/1

andrewferrier avatar Aug 22 '16 18:08 andrewferrier

This could be generation of the PDF, not reading: http://stackoverflow.com/a/28694708/27641

andrewferrier avatar Aug 22 '16 19:08 andrewferrier

Option to try; use html-pdf to generate? https://www.npmjs.com/package/html-pdf

andrewferrier avatar Aug 22 '16 19:08 andrewferrier