Python-Scripts
Python-Scripts copied to clipboard
Add PDF Image Extractor script with README documentation
This PR adds a new script, PDF Image Extractor, which recursively scans a directory tree for PDF files and extracts all embedded images from each document.
- All extracted images are saved in a subfolder named
PDFwithin the input root directory by default (customizable via--out). - Each PDF file is organized into its own folder, containing all images extracted from that document.
- The script supports an optional
--dedupflag to enable per-PDF deduplication of images.
Additional notes:
- Please let me know if you’d like any changes to the folder naming or CLI options.
- Happy to update documentation or add more examples if needed.