tabula-java
tabula-java copied to clipboard
Cannot read the text in the PDF printed by chrome “Miscrosoft print to pdf”
When I open the chrome browser, press Ctrl + P and select "MicroSoft print to PDF" to get a PDF file. I use CLI to read the content of the pdf file, but nothing is output.
The command is as follows:
java -jar .\tabula-1.0.4-jar-with-dependencies.jar chrome_ms.pdf -t
I tried four different situations,:
- ctrl+p on chrome select ‘Miscrosft print to pdf’
- ctrl+p on chrome select ‘save as’
- ctrl+p on firefox select ‘Miscrosft print to pdf’
- ctrl+p on firefox, select ‘save as’
Only in the first case, the CLI output is empty. I cannot accept that my application will encounter empty content output for no reason. So, can anyone help me find out the cause of the problem and how to solve it.
Tested "pdfbox-app-2.0.23.jar" and "pdfbox-app-3.0.0-RC1.jar". I found that using the CLI of PDFBOX to extract text has exactly the same problem.