Extracting images with transparent backgrounds?
Hello again!
I want to be able to extract the images with alpha channels. From searching, I saw an old issue where it sounds like I need to get the soft masks. Is there anything HexaPDF today to help get transparent images? Any general tips?
You can use the Image#info method on image XObjects. This returns an informational structure with information about the image. There is a #writeable method which returns true if the image can be written by HexaPDF.
So if that is true and the image has transparency defined, HexaPDF can combine the image data with the alpha data to create a PNG file. If it is false, then either the image type itself is not supported or HexaPDF doesn't yet know how to combine the image with the alpha data.
There is also the hexapdf images command which allows you to list and extract supported images from a PDF.
Thanks again for the help!
I wasn't able to get the transparent images out using hexapdf. When using this online tool that extracts images. I was able to get all the masks out (image below). Any tips for how I can get these same mask files out? So far checking SMask and it's almost always None or non-existant. I was only able to extra one mask but I can't seem to find the rest.
Also, I played around with CLI. Not sure if it's intentional but some of the images get missed. It might be because the xobject names conflict. One of the PDFs I'm testing with has images inside subtype = :Form and in there the names conflict with root resources e.g /Im0.
Could you provide a sample file? This would make it much easier to see what's going on.
As for the CLI: It can only export the images that HexaPDF itself supports.
Thanks @jacobcupcake - I will have a look!
I think I figured it out. There's another GS being set before the form is rendered.
@jacobcupcake The file grapes3.pdf shows a single page with grapes. However, the file itself contains more than this image.
If you run hexapdf images -s grapes3.pdf you will get a list of 31 images all of which can be extracted by HexaPDF. The first one with the ICC color space is the image of the grapes. The others are greyscale images and appear to be imagemasks, like the one you mention in https://github.com/gettalong/hexapdf/issues/354#issuecomment-2716855959
(Those issue discussions are quite helpful; I just ran "hexapdf images" on a random .pdf file and got a neat table overview of all images. I was not aware before that this is possible. This is quite convenient for automatic parsing that result via ruby too, by the way.)