pdfminer icon indicating copy to clipboard operation
pdfminer copied to clipboard

scanned pdf or native pdf

Open longbowking opened this issue 5 years ago • 2 comments

Given a pdf file, how to judge whether it is a native pdf or a scanned pdf by using pdfminer, any suggestions?

longbowking avatar Dec 02 '19 11:12 longbowking

You could extract all text and consider the pdf as "native" if there is too little. Of course this would fail for "native" pdf's that have no text.

himanshugarg avatar Dec 02 '19 11:12 himanshugarg

Yes, that's another case where this heuristic will fail.

himanshugarg avatar Dec 02 '19 11:12 himanshugarg