PNG: Metadata APIs skip metadata after IDAT
PNG allows for iTXt, tEXt, and zTXt chunks to appear at any point in the file, i.e. before or after IDAT (see here).
Expected
ImageDecoder::iptc_metadata and friends should include all metadata in the file.
Actual behaviour
ImageDecoder::iptc_metadata and friends currently only return metadata that is specified before the first IDAT chunk.
Reproduction steps
Here's a repro PNG and a (currently failing) test: https://github.com/image-rs/image/commit/567abd77b5a942b0a7d501fffbdd06efaef3b2e7. Solving this is not trivial: ImageDecoder currently does not walk past the IDAT chunks until read_image is called. At which point the decoder is consumed and metadata cannot be accessed anymore. 🙃
Going through the entire file before returning metadata fixes this. I've drafted one possible approach here:
- https://github.com/image-rs/image/compare/main...mhils:image:metadata?expand=1
- https://github.com/image-rs/image-png/compare/master...mhils:image-png:metadata?expand=1
This makes things work, but it's not very performant - the current low-level image-png API simply does not mix very well with the high-level metadata accessors in image-png. Do you folks have any thoughts on what's the best way to solve this?
Thank you for your fantastic work on image-png! 🍰 😃
You've pretty much identified the problem in ImageDecoder already. There's going to be a rewrite of the trait in 1.0 to take it into account. For metadata occurring after the initial header we want the decoding to support this sequence (just a sketch for now):
let mut decoder = …
decoder.set_limits(…);
let frame_info = decoder.init()?;
let mut buffer = buffer_from_info(&frame_info);
decoder.read_image(&mut buffer)?;
decoder.iptc_metadata()?; // or other metadata
In 0.25 the ImageDecoder::read_image consumes the decoder so it is no longer possible to read any metadata from it. I think the png API is not at fault here in any way, it roughly reflects how a PNG file is structured which is most important. It should be image that is amenable to using that structure as well.
Added it explicitly to the plan of breaking changes.
Thank you for the super quick feedback! 😃
I also considered the approach you sketched (make read_image not consume self), but I think it maybe makes for a not-so-great API? First, it's not obvious at all that he following code does not return all metadata from the image:
let mut decoder = PngDecoder::new(...)?;
decoder.iptc_metadata()?;
This is a slightly mean footgun because the code returns metadata for some images. That's easy to miss in testing. Second -- probably less severe -- I think it also means that there is no way to access metadata without reading the entire image into a buffer first. Maybe it makes more sense to have some .peek_at_metadata helper in the image-png crate reader that seeks through the file if called before read_image? That will be a bit ugly to implement, but could make for a more intuitive image API.
That being said - you are the API expert here, feel free to disregard that suggestion. :) It's not a dealbreaker.
I think some sort of peek_trailing_metadata on the png crate is probably the least-bad approach. We could call it lazily only if the user asked for metadata we haven't seen yet before the IDATs.
This is actually not far from how the image-webp crate works. But in that case we unconditionally unconditionally scan the full file during initialization to record the positions of metadata chunks, since the WebP format requires certain metadata chunks to be after the image data.