pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

Poll: Do you want an executable to get text from a PDF in the terminal?

Open k00ni opened this issue 2 years ago • 3 comments

I am curious if there is a need for a (standalone) executable to get text from a given PDF?

It would be a PHP script still, but can be called in the terminal for shell related tasks. Maybe something like the following?

# show text of PDF file
$ ./pdfparser/bin/get_text /foo/Bar.pdf

This is example text ...

or

# write raw text of PDF file into a file
$ ./pdfparser/bin/get_text /foo/Bar.pdf > pdf_text.txt

When running this command, the extracted text of /foo/Bar.pdf will be written to pdf_text.txt. But one could also use it to directly search in it via grep etc.


If you need/want something like this please use emoticon :+1:, otherwise :-1:. Comments and ideas are welcome.

Thank you for taking the time.

k00ni avatar Aug 21 '23 13:08 k00ni

I switched from pdftotext to PdfParser specifically so my search engine (that scans HTML and PDF files) could have an all PHP solution instead of requiring a binary. But a binary might be useful in other situations.

I think the key argument for/against would be: Can PdfParser do a better job than pdftotext? It's a pretty mature product. https://www.xpdfreader.com/pdftotext-man.html

GreyWyvern avatar Aug 21 '23 15:08 GreyWyvern

It can really make things easier in some areas. I have encountered a situation where I needed something like this a few times.

Reqrefusion avatar Sep 18 '23 21:09 Reqrefusion