Add PDF Image Extractor script with README documentation

Open gracetyy opened this issue 2 months ago • 0 comments

This PR adds a new script, PDF Image Extractor, which recursively scans a directory tree for PDF files and extracts all embedded images from each document.

All extracted images are saved in a subfolder named PDF within the input root directory by default (customizable via --out).
Each PDF file is organized into its own folder, containing all images extracted from that document.
The script supports an optional --dedup flag to enable per-PDF deduplication of images.

Additional notes:

Please let me know if you’d like any changes to the folder naming or CLI options.
Happy to update documentation or add more examples if needed.

Oct 20 '25 10:10 gracetyy