tabulapdf
tabulapdf copied to clipboard
Add function for printing out
As it was pointed out recently, the tables cannot be correctly extracted from PDF files where these tables contain values that are input through editable fields. The solution that seems to work is to make all values part of the text layer through "Print as PDF" functionality in a file viewer. While it's possible to automate the process with cups-pdf
and a bash script that would iterate over files and run lpr -P PDF filename_i.pdf
, it would be good to have this functionality available from within tabulizer
. Fortunately, Apache PDFBox includes a PrintPDF command-line tool and respective class that should be possible to expose to users.