image icon indicating copy to clipboard operation
image copied to clipboard

Expose the `png` crate's APIs for reading and writing text chunks

Open ssokolow opened this issue 1 month ago • 8 comments

I would like to be able write PNG tEXt, iTXt, and zTXT chunks without having to copy-paste PngEncoder::encode_inner into my own codebase and edit it, and read them back without bypassing image and using the png crate directly.

My specific use case for this functionality is to write an implementation of the XDG Thumbnail Managing Standard so my creations can use my desktop's shared thumbnail cache.

This is more generally applicable to any use of PNG which stores metadata in PNG's text chunks instead of the much younger (July 2017) EXIF chunk. (eg. PNG files generated by Easy Diffusion may contain the prompting parameters in text chunks if the user chose to enable that.)

Draft

Given the format-specific nature of specifying the storage method for metadata precisely enough to comply with the XDG spec, I imagine the most idiomatic way to do this would be to add new methods on PngEncoder and PngDecoder, similar to how PngDecoder::gamma_value already exists.

At minimum (and to ensure interoperability with any applications that hard-code whether to look for a field in a tEXT vs. and iTXt vs. a zTXt chunk), any writing API should allow specifying which kind of chunk to add.

For reading, it'd be much more convenient if, at least as an option, an abstraction could be provided that avoids the need for me to reinvent reading png's uncompressed_latin1_text, compressed_latin1_text, and utf8_text Vecs into a single HashMap so I can retrieve chunks by name. (eg. Retrieving Thumb::URI to check for the original image's nonexistence as part of determining whether a thumbnail is stale.)

The XDG spec makes no mention of that distinction, so applications are free to encode the field as whichever type of text chunk they decide is best. (eg. encoding Thumb::MTime as a tEXt but encoding Thumb::URI as a zTXt.)

Given that text chunks can have duplicate names even without the tEXt vs. iTXt vs. zTXt thing and some consumers of the API will want to just retrieve "the value" while others will want to see all the values, maybe something which draws inspiration from HTTP query string parsers.

Beyond that, I don't feel experienced enough with image's API on a level beyond superficial to meet my own standards for suggesting an API design.

ssokolow avatar Oct 28 '25 05:10 ssokolow

This sort of format-specific functionality tends to be a better fit for the underlying format crates. The encode_inner looks like a lot of code, but that's mostly because it is forwarding all the optional params from this crate and doing type conversions both ways. If you use the png crate directly you won't have to deal with that. And if there are API improvements we could make, I'd rather make them directly to the png crate rather than papering over them from this crate.

fintelia avatar Oct 31 '25 23:10 fintelia

This sort of format-specific functionality tends to be a better fit for the underlying format crates.

It's already in there. I'm asking for a way to reuse the "forwarding all the optional params from this crate and doing type conversions" bits without copy-pasting them into my codebase.

I proposed what I did because it seemed consistent with ImageEncoder::set_exif_metadata.

If you use the png crate directly you won't have to deal with that.

I get the impression my use-case was unclear.

  1. Use image or ffmpeg or a PDF renderer whatever other API is appropriate to retrieve full-size frames from dozens of different formats.
  2. Use image (or perhaps fast_image_resize) to downscale them to thumbnail size
  3. Create a PNG file and write the metadata chunks (this is the only part accessing png directly could solve)
  4. Without translating everything to a single pixel format and then having to optipng to undo it, feed the image data into the PNG file.

I literally want "What PngEncoder::encode_inner does, but with the ability to add some text chunks to the Info struct", either directly or using something higher-level.

I suppose, assuming png doesn't consider it invalid to set the fields image sets after text chunks have been added, an alternative would be for image to have something like PngEncoder::new_with_info which lets me supply a png::Info pre-populated with the text chunks, instead of it creating a new one.

And if there are API improvements we could make, I'd rather make them directly to the png crate rather than papering over them from this crate.

I don't see how what I want could be put into png without establishing a circular dependency so png could have a function that ingests image's DynamicImage type.

ssokolow avatar Nov 01 '25 01:11 ssokolow

I get where your need is coming from but there's a significant difference to the EXIF. For exif there's a standard / specification that describes how the same semantic data is to be embedded into different image file formats (https://www.cipa.jp/std/documents/download_e.html?CIPA_DC-008-2024-E). There is no such thing for generic text chunks. So in my eyes the discussion is, if anything, about whether the XDG thumbnail managing standard can be supported.

It's quite niche to support a standard that only works on one image type. On the other hand, since it comes up quite often, I also really wish we had a better way to provide capabilities to implement standards downstream of image, hooks or otherwise.

197g avatar Nov 01 '25 14:11 197g

So in my eyes the discussion is, if anything, about whether the XDG thumbnail managing standard can be supported.

That seems rather drastic. I understand not supporting text chunks specifically, but what would be wrong with providing access to the png::Info (eg. PngEncoder::new_with_info, if the tests I plan to do after I sleep show that adding text chunks doesn't freeze the fields PngEncoder::encode_inner needs to set).

To me, that seems at least as justifiable as things like PngDecoder::apng... especially given that text chunks have been around since the beginning, while APNG was an explicit violation of the spec that Mozilla essentially "Microsoft EEE" bullied the standards people into blessing in the newest version of the spec.

PNG also does not support multiple images in one file. This restriction is a reflection of the reality that many applications do not need and will not support multiple images per file. In any case, single images are a fundamentally different sort of object from sequences of images. Rather than make false promises of interchangeability, we have drawn a clear distinction between single-image and multi-image formats. PNG is a single-image format.

-- https://www.w3.org/TR/PNG-Rationale.html (01-October-1996)

image has had APNG support since mid-2020, based on a quick examination of docs.rs, but, until the third edition of the PNG spec (June 2025), APNG was an explicit violation of the spec, which said a PNG header indicates that what follows contains "a single PNG image" and proceeds to define "single PNG image" as one or more derivatives of a single reference image (defined as a rectangular array of rectangular pixels) produced by operations such as alpha separation and sample depth scaling.

That's why libpng, being the reference implementation, didn't support APNG all that time, and Mozilla had to keep their own fork of it. APNG was essentially a hostile takeover of libpng by Mozilla and the PNG spec blessing it was essentially acknowledging that, if they didn't, Mozilla's fork would become the de facto reference implementation everyone was using.

It's quite niche to support a standard that only works on one image type.

XDG Thumbnail Spec is hardly the only thing which uses PNG's text chunks, because PNG has been around since 1996 but only had an EXIF chunk standardized in 2017.

For example:

According to XMP Specification, an XMP packet is embedded in a PNG graphic file by adding a chunk of type iTXt with the keyword 'XML:com.adobe.xmp'.

-- https://dev.exiv2.org/projects/exiv2/wiki/The_Metadata_in_PNG_files

Embedding generation parameters into PNGs via text chunks is also apparently growing in popularity among AI image generators. (eg. all major Stable Diffusion frontends now support it)

EDIT: And, at least for Easy Diffusion, it's using the text chunks as a key-value store, similar to what the XDG Thumbnail spec calls for, not serializing into a blob of JSON or XML or something and dropping it all into one chunk.

I don't use them, so I can't log in to check if it's a hallucination, but Perplexity is claiming that DALL-E and Midjourney also now support it.

Not supporting it in some manner is like an audio library not allowing someone to set the VORBISGAIN tag in an .ogg stream because it's not the ID3 tag that "reasonable" formats like Monkey's Audio allow you to just plunk in.

Also, bear in mind that I've already had to write my own cleaner for my XDG thumbnail cache because, at one point in the past, I just used image to write thumbnails without the metadata keys and it turns out that the program I was letting manage the cache treats "no metadata/spec-noncompliant" as "never stale" rather than "always stale".

The current status quo is a not-insignificant footgun.

ssokolow avatar Nov 01 '25 18:11 ssokolow

Oh, and to be clear, in case you don't look into it further, XMP (which, as I mentioned, also requires a means to write PNG iTXt chunks), though originated by Adobe, is standardized as ISO 16684-1:2019 part 1 & ISO 16684-2:2014 part 2... so a strong argument can be made that, at the very least, there should be some non-pub iTXt-writing code for the PNG backend for XMP that I can yoink into my PngEncoder fork if this gets WONTFIX'd.

ssokolow avatar Nov 03 '25 05:11 ssokolow

That seems rather drastic. I understand not supporting text chunks specifically, but what would be wrong with providing access to the png::Info (eg. PngEncoder::new_with_info, if the tests I plan to do after I sleep show that adding text chunks doesn't freeze the fields PngEncoder::encode_inner needs to set).

I forgot that png::Info has a lifetime parameter. That's what's wrong with just providing access to it in the obvious ways and overwriting anything that currently has its value set by image.

I guess I'll just stick to this while I wait for a resolution. It seems to be the least wasted effort on my part.

Pseudo-patch
//! This file contains the
//! [`PngEncoder`](https://docs.rs/image/0.25.8/image/codecs/png/struct.PngEncoder.html) struct
//! from the [`image`](https://lib.rs/crates/image) crate, as copied from
//! <https://docs.rs/image/0.25.8/src/image/utils/mod.rs.html>, with relevant `pub(crate)`
//! dependencies inlined.
//!
//! It has been modified to work around `image` not providing a way to set PNG `tEXt`, `iTXt`, or
//! `zTXt` chunks as required by standards such as the XDG
//! [Thumbnail Managing Standard](https://specifications.freedesktop.org/thumbnail/latest-single/)
//! and the [XMP Standard](https://en.wikipedia.org/wiki/Extensible_Metadata_Platform).
//!
//! See [`image-rs/image#2633`](https://github.com/image-rs/image/issues/2633) for efforts to
//! remove the need to maintain this fork.

// [...]

use png::text_metadata::{ITXtChunk, TEXtChunk, ZTXtChunk};

// [...]

pub struct PngEncoder<W: Write> {
    // [...]
    text_metadata: Vec<TEXtChunk>,
    ztxt_metadata: Vec<ZTXtChunk>,
    itxt_metadata: Vec<ITXtChunk>,
}

// [...]

impl<W: Write> PngEncoder<W> {
    /// Create a new encoder that writes its output to ```w```
    pub fn new(w: W) -> PngEncoder<W> {
        PngEncoder {
            // [...]
            text_metadata: Vec::new(),
            itxt_metadata: Vec::new(),
            ztxt_metadata: Vec::new(),
        }
    }

// [...]

    pub fn new_with_quality(
        w: W,
        compression: CompressionType,
        filter: FilterType,
    ) -> PngEncoder<W> {
        PngEncoder {
            // [...]
            text_metadata: Vec::new(),
            itxt_metadata: Vec::new(),
            ztxt_metadata: Vec::new(),
        }
    }

// [...]

    fn encode_inner(
        // [...]
        
        if !self.text_metadata.is_empty() {
            info.uncompressed_latin1_text = self.text_metadata.clone();
        }
        if !self.itxt_metadata.is_empty() {
            info.utf8_text = self.itxt_metadata.clone();
        }
        if !self.ztxt_metadata.is_empty() {
            info.compressed_latin1_text = self.ztxt_metadata.clone();
        }

        // [...]
    }

    fn set_text_metadata(&mut self, text: Vec<TEXtChunk>) -> () {
        self.text_metadata = text;
    }
    fn set_ztxt_metadata(&mut self, ztxt: Vec<ZTXtChunk>) -> () {
        self.ztxt_metadata = ztxt;
    }
    fn set_itxt_metadata(&mut self, itxt: Vec<ITXtChunk>) -> () {
        self.itxt_metadata = itxt;
    }
}

Since the file is functionally a vendored fork of two structs (PngEncoder and, because #[non_exhaustive], FilterType) functioning as an internal API, I remain free to change the API's shape to try to eliminate those .clone() calls later if they start to show up in the flamegraph.

ssokolow avatar Nov 03 '25 07:11 ssokolow

What would be wrong with providing access to the png::Info (eg. PngEncoder::new_with_info, if the tests I plan to do after I sleep show that adding text chunks doesn't freeze the fields PngEncoder::encode_inner needs to set).

We had been there before and it caused issues. We then need to lockstep any version change, i.e. [email protected]:png::Info is incompatible with [email protected]:png::Info or any alternative underlying library, but with all codec libraries which causes just constant breakage. Not workable in delivering a stable library. It isn't so much we don't want it in principle but how to do it in a workable way without painting ourselves into a corner exposing too many changing corners of the implementation—we do have breaking changes in the dependencies for good reason. We get a similar multiplicative explosion in complexity for all single-format features in the interface, it is maintenance overhead that must be paid going forward indefinitely so we're rather careful with precedence. XMP etc. is one feature, not one per format, hence no complexity explosion. If there will be one for encoding some LLM stuff then we can surely discuss an interface for embedding and/or extracting that, too (alas with the caveat of wondering why they don't use XMP).

I'm not sure where you're going with APng at all, your missing the point. I had absolutely no problem implementing it in the png crate based purely on the documentation on wiki.mozilla.org; animated images parse as normal PNG just fine—yielding the thumbnail. Putting the 1.0 spec, finished in 2008, in w3c is formality for browser vendors (i.e. Chromium and Safari)—it's in any case heartening to see standardization happening openly in a git-repo. We may have gain maps before their standardization if the underlying spec gets sufficiently clear (including, non-patent encumbered and freely available). As you said, Mozilla had maintained a fork for APNG but there's nothing hostile about that and the matured code was merged into libpng in June of this year. That's how FOSS works, someone shouldering the work for what they want and in sharing it ensuring its review by peers, I fail to see how where undue pressure was applied. Chromium has a fork of image-rs/png where they ensure additional requirements on incremental encoding and Android targets, Google backstreams code they care about and took over parts of the engineering analysis to motivate changes they prioritized when there was contention. That's work, as intended.

197g avatar Nov 03 '25 14:11 197g

We had been there before and it caused issues. We then need to lockstep any version change, i.e. [email protected]:png::Info is incompatible with [email protected]:png::Info or any alternative underlying library, but with all codec libraries which causes just constant breakage. Not workable in delivering a stable library. It isn't so much we don't want it in principle but how to do it in a workable way without painting ourselves into a corner exposing too many changing corners of the implementation.

That's fair.

My argument for going in that direction was in response to what I perceived as a desire to minimize how much PNG-specific API surface image exposes. My original vision was to fully abstract png for similar reasons and, if my little fork of PngEncoder were to be upstreamed, I'd expect it to have function signatures closer to set_ztxt_metadata(&mut self, ztxt: Vec<(String, String)>) than the set_ztxt_metadata(&mut self, ztxt: Vec<ZTXtChunk>) it currently has.

PNG text chunks are, abstractly, (key, value, storage type) triples, where the storage type is a plain/compressed/utf-8 enum while png's abstract model for them acknowledges that they're technically three independent chunk types and exposes them as three completely separate sets of (key, value) tuples, which my XDG Thumbnail cleaner uses a macro to DRY the re-combination of into a single HashMap.

XMP etc. is one feature, not one per format

XMP is stored differently for each format. I don't see how you consider it "one feature" for maintainability purposes.

  • TIFF – Tag 700
  • JPEG – Application segment 1 (0xFFE1) with segment header "http://ns.adobe.com/xap/1.0/\x00"
  • JPEG 2000 – "uuid" atom with UID of 0xBE7ACFCB97A942E89C71999491E3AFAC
  • JPEG XL – "xml " box type
  • PNG – inside an "iTXt" text block with the keyword "XML:com.adobe.xmp"
  • GIF – as an Application Extension with identifier "XMP Data" and authentication code "XMP" [...]
  • WebP – inside the file's XMP chunk For file formats that have no support for embedded XMP data, this data can be stored in external .xmp sidecar files.

-- https://en.wikipedia.org/wiki/Extensible_Metadata_Platform

By contrast, giving the user the ability to hang some text chunks off what they're feeding into PngEncoder and leaving it up to them to provide the raw contents of the chunks is one feature for maintainability purposes, which cuts across every format that uses text chunks as its metaformat.

If there will be one for encoding some LLM stuff then we can surely discuss an interface for embedding and/or extracting that, too (alas with the caveat of wondering why they don't use XMP).

I don't understand your stance here. This is akin to saying "We don't want to implement APIs for reading and writing Zip files... but we're open to implementing and maintaining higher-level APIs for OpenDocument and EPUB and APK and XPI and JAR and every other format built on top of them".

As for why they don't use XMP, probably because, from their perspective, doing so would be jumping through hoops and stacking up layers of abstraction irrelevant to the particular data they're storing, like how Netscape created Mork, which is technically a lot of things considered virtuous at the time while making the most pessimal file format known to humankind.

From their perspective, PNG text chunks are akin to storing key-value pairs in JSON, while XMP stored in the specified PNG chunk is akin to RDF encoded in JSON-LD... except that XMP's most common serialization is based on RDF/XML, so you're essentially wondering why every project that needs to store a little bit of key-value data doesn't dig up or write an XMP implementation, which uses RDF, which is serialized in XML, which then finally goes into the chunks they could have just used directly... all for storing metadata they don't anticipate there being that much demand to translate from one application to another, even before they stacked up that tower of "must implement to parse".

(I was not at all surprised to learn that XMP was first released to the public in 2001. It's very much a product of the same 1990s mindset that gave us things like the ComponentFactoryBuilderBean trope of "Enterprise Java".)

I'm not sure where you're going with APng at all, your missing the point. I had absolutely no problem implementing it in the png crate based purely on the documentation on wiki.mozilla.org;

I did get a little carried away, but my point was to push back against your claimed rationale for supporting EXIF but not PNG text chunks by arguing that, at the time APNG was supported, it was less "legitimate" than PNG text chunks by those same metrics.

As you said, Mozilla had maintained a fork for APNG but there's nothing hostile about that and the matured code was merged into libpng in June of this year. That's how FOSS works, someone shouldering the work for what they want and in sharing it ensuring its review by peers, I fail to see how where undue pressure was applied.

It was literally "We don't like this core design tenet of the PNG spec, so we're just going to violate it and popularize those spec-violating files until they're forced to either change the spec or become irrelevant". How is that not the Extend phase of Microsoft's Embrace, Extend, Extinguish strategy, except from people with a better reputation?

Chromium has a fork of image-rs/png where they ensure additional requirements on incremental encoding and Android targets, Google backstreams code they care about and took over parts of the engineering analysis to motivate changes they prioritized when there was contention. That's work, as intended.

There's a difference between something like libjpeg-turbo or AdvanceCOMP or whatever Google's Zlib optimizer was called, which continue to produce fully compliant JPEG or DEFLATE streams, even if they impose additional restrictions on what they'll produce, and a libpng fork which explicitly breaks from the PNG spec and intentionally encourages people to produce and consume noncompliant files, as Microsoft's famous policy for proprietarizing open standards like Kerberos did.

ssokolow avatar Nov 03 '25 20:11 ssokolow