Input/Export from/to file and stdin/stdout
Hi,
Currently, when exporting PDF content it is only possible to specify the name of the directory to which exported text files shall be written (outDir):
$ pdfcpu extract
usage: pdfcpu extract -m(ode) i(mage)|f(ont)|c(ontent)|p(age)|m(eta) [-p(ages) selectedPages] inFile outDir
It would be very useful if it were possible to specify filenames instead:
Export all PDF pages to one file:
$ pdfcpu extract -m content -o all_pages.txt some.pdf
Export one page to file:
$ pdfcpu extract -m content -p 1 -o page1.txt some.pdf
Export selected pages to the distinct files:
$ pdfcpu extract -m content -p 1 -o page1.txt -p 2 -o page2.txt some.pdf
Export selected pages to the same file:
$ pdfcpu extract -m content -p 1 -o pages1+3.txt -p 2 -o page2.txt -p 3 -o pages1+3.txt some.pdf
or
$ pdfcpu extract -m content -p 1,3 -o pages1+3.txt -p 2 -o page2.txt some.pdf
In particular, it would be useful, if stdin could be used to input a PDF file and stdout to write the exported content. This would enable PDF processing on the shell using pipes:
Read PDF input from stdin:
$ curl https://internet/some.pdf | pdfcpu extract -m content -o some_pages.txt -
Export text to stdout:
$ pdfcpu extract -m content -o - some.pdf | fgrep "Chapter 3:"
Best, Matthias
Hello!
Support for shell piping is a useful addition.
As far as your suggested addition to the extract command line processing I'd rather leave that up to the calling script.
I am also not in favour of using -o repeatedly within one command.
And if we're starting to use -o than that would have to change for all pdfcpu commands.