Python-Scripts icon indicating copy to clipboard operation
Python-Scripts copied to clipboard

Add PDF Image Extractor script with README documentation

Open gracetyy opened this issue 2 months ago • 0 comments

This PR adds a new script, PDF Image Extractor, which recursively scans a directory tree for PDF files and extracts all embedded images from each document.

  • All extracted images are saved in a subfolder named PDF within the input root directory by default (customizable via --out).
  • Each PDF file is organized into its own folder, containing all images extracted from that document.
  • The script supports an optional --dedup flag to enable per-PDF deduplication of images.

Additional notes:

  • Please let me know if you’d like any changes to the folder naming or CLI options.
  • Happy to update documentation or add more examples if needed.

gracetyy avatar Oct 20 '25 10:10 gracetyy