Yusuke Shinyama

Results 23 comments of Yusuke Shinyama

Your sample code worked fine to me. You just need CMap for extracting non-ASCII texts (e.g. Japanese). Try doing: $ make cmap on the pdfminer directory. If you already have...

What's in your $PYTHONPATH variable? There could be a previous version of pdfminer in your system which is responding incorrectly.

I don't think there's any reliable workaround. The thing is that not everything that looks like text on PDF can be easily converted to actual texts. The prime example would...

Sadly there's no standard way to address this issue, because the way ligatures are handled is PDF-specific. Unicode actually has characters for ligature, but they're only for backward compatibility and...

Thanks for reporting! The patch by @naren8642 and the later fix should address this. Can you try the latest version and see how it works?

> Dear Yusuke, I am a big fan of pdfminer, thanks for creating it! Currently, I ask myself the question: With Python3.8 in my development environment, which repo should I...

First, can you try it with the latest version? Its installation is more automatic now so that you don't screw up with the cmap step. Second, even if you did...

Do you get this error on every pdf? Can I have the pdf that causes this problem?

I don't think you can. Upload somewhere else and post a link to it.

Sorry for the late reply. Commit b589da51b7bd0ea97597fc8f40cf1e68115e5b94 have fixed this, so the latest revision shouldn't have this problem.