pdfminer
pdfminer copied to clipboard
scanned pdf or native pdf
Given a pdf file, how to judge whether it is a native pdf or a scanned pdf by using pdfminer
,
any suggestions?
You could extract all text and consider the pdf as "native" if there is too little. Of course this would fail for "native" pdf's that have no text.
Yes, that's another case where this heuristic will fail.