Add "tools" that are installed along with library
PDFio includes example programs that would likely be useful as tools. Much as MuPDF, Poppler, and Xpdf include such tools, PDFio should also do so.
Suggested tools for next feature release:
- pdfioinfo: Shows information about the PDF file - title, number of pages, etc.
- pdfiolint: New utility that does its best to look for bad/duplicate content, orphaned objects.
Possibly include "pdf2txt" and "pdf2images", renamed to "pdfiotext and pdfioimages".
Hi, I am working on this, will post a PR soon!!
@uddhavphatak No hurry, but if you do come up with some of these tools I can include them as examples until I am ready to support the standalone programs...
Hi, I was working on the pdf2text.c code, and noticed the different types of Encodings and CMap types.
I have found test files for /Encoding/Identity-H/Subtype/Type0 and have successfully made changes to parse the text out of this.
but I am trying to make the code good enough for any type of pdf file, containing any encoding type.
where can I find these different types of PDF files.
I don't know of a specific repository for PDF text files, but the PDF Association has a stressful PDF corpus that might offer something useful. I've actually used this corpus when testing PDFio to make sure we can handle typical PDFs...