pdfio icon indicating copy to clipboard operation
pdfio copied to clipboard

Add "tools" that are installed along with library

Open michaelrsweet opened this issue 10 months ago • 4 comments

PDFio includes example programs that would likely be useful as tools. Much as MuPDF, Poppler, and Xpdf include such tools, PDFio should also do so.

Suggested tools for next feature release:

  • pdfioinfo: Shows information about the PDF file - title, number of pages, etc.
  • pdfiolint: New utility that does its best to look for bad/duplicate content, orphaned objects.

Possibly include "pdf2txt" and "pdf2images", renamed to "pdfiotext and pdfioimages".

michaelrsweet avatar Feb 22 '25 16:02 michaelrsweet

Hi, I am working on this, will post a PR soon!!

uddhavphatak avatar Oct 02 '25 18:10 uddhavphatak

@uddhavphatak No hurry, but if you do come up with some of these tools I can include them as examples until I am ready to support the standalone programs...

michaelrsweet avatar Oct 02 '25 20:10 michaelrsweet

Hi, I was working on the pdf2text.c code, and noticed the different types of Encodings and CMap types.

I have found test files for /Encoding/Identity-H/Subtype/Type0 and have successfully made changes to parse the text out of this.

but I am trying to make the code good enough for any type of pdf file, containing any encoding type.

where can I find these different types of PDF files.

uddhavphatak avatar Oct 07 '25 19:10 uddhavphatak

I don't know of a specific repository for PDF text files, but the PDF Association has a stressful PDF corpus that might offer something useful. I've actually used this corpus when testing PDFio to make sure we can handle typical PDFs...

michaelrsweet avatar Oct 09 '25 19:10 michaelrsweet