pdfparser
pdfparser copied to clipboard
Poll: Do you want an executable to get text from a PDF in the terminal?
I am curious if there is a need for a (standalone) executable to get text from a given PDF?
It would be a PHP script still, but can be called in the terminal for shell related tasks. Maybe something like the following?
# show text of PDF file
$ ./pdfparser/bin/get_text /foo/Bar.pdf
This is example text ...
or
# write raw text of PDF file into a file
$ ./pdfparser/bin/get_text /foo/Bar.pdf > pdf_text.txt
When running this command, the extracted text of /foo/Bar.pdf will be written to pdf_text.txt. But one could also use it to directly search in it via grep etc.
If you need/want something like this please use emoticon :+1:, otherwise :-1:. Comments and ideas are welcome.
Thank you for taking the time.
I switched from pdftotext to PdfParser specifically so my search engine (that scans HTML and PDF files) could have an all PHP solution instead of requiring a binary. But a binary might be useful in other situations.
I think the key argument for/against would be: Can PdfParser do a better job than pdftotext? It's a pretty mature product. https://www.xpdfreader.com/pdftotext-man.html
It can really make things easier in some areas. I have encountered a situation where I needed something like this a few times.