Yusuke Shinyama comments

Results 23 comments of


                                            Yusuke Shinyama

Still have issues with CID Characters

Your sample code worked fine to me. You just need CMap for extracting non-ASCII texts (e.g. Japanese). Try doing: $ make cmap on the pdfminer directory. If you already have...

Still have issues with CID Characters

What's in your $PYTHONPATH variable? There could be a previous version of pdfminer in your system which is responding incorrectly.

Still have issues with CID Characters

I don't think there's any reliable workaround. The thing is that not everything that looks like text on PDF can be easily converted to actual texts. The prime example would...

Still have issues with CID Characters

Sadly there's no standard way to address this issue, because the way ligatures are handled is PDF-specific. Unicode actually has characters for ligature, but they're only for backward compatibility and...

ValueError raised on parsing some PDF (apply_png_predictor)

Thanks for reporting! The patch by @naren8642 and the later fix should address this. Can you try the latest version and see how it works?

ValueError raised on parsing some PDF (apply_png_predictor)

> Dear Yusuke, I am a big fan of pdfminer, thanks for creating it! Currently, I ask myself the question: With Python3.8 in my development environment, which repo should I...

Converting simple english CID's to ascii characters

First, can you try it with the latest version? Its installation is more automatic now so that you don't screw up with the cmap step. Second, even if you did...

list index out of range error

Do you get this error on every pdf? Can I have the pdf that causes this problem?

list index out of range error

I don't think you can. Upload somewhere else and post a link to it.

list index out of range error

Sorry for the late reply. Commit b589da51b7bd0ea97597fc8f40cf1e68115e5b94 have fixed this, so the latest revision shouldn't have this problem.