pdf-toolbox
pdf-toolbox copied to clipboard
Extracting images and diagrams from XObjects
Hi!
Would you be open to extend XObjects to support API that would deal with non-text (images in my particular case) data extraction only? Something along the line of https://blog.idrsolutions.com/how-images-are-stored-in-pdf/
Hopefully I might be able to contribute it as part of my working hours, I'm exploring my options at the moment :)
@coderfromhere Sure thing! Few issues to consider:
- API for extracting should be designed carefully. It should be easy to use and at the same time don't impose arbitrary restrictions.
- Dependency footprint should not grow too much. You might need libraries for different type of images, right?
So far I was just thinking of providing streaming of bitmaps/random bytes (without full-featured format recogniser) into arbitrary locations, either through conduit, or as I see it in the dependencies already, io-streams.
So it's a low level API to extract raw image data + metadata. Sounds reasonable for me! I'd prefer io-streams since they are there already.