image icon indicating copy to clipboard operation
image copied to clipboard

Implement `exif_metadata` for TIFF

Open 1c3t3a opened this issue 2 months ago • 5 comments

Exif metadata for TIFF is funny, as the Tiff file itself is stored in the exif directory structure. So the trivial answer to what is Exif metadata is just the bytes of the file itself.

Note: This is not the most efficient way of obtaining the metadata as it also copies the image bytes, which we technically don't need. The alternative here is to add this to the tiff create and copy the correct metadata sections together. We didn't pursue this route as there were ideas to rework the entire tiff::Value type and this would likely add a huge dependency on the current behavior. We're open for feedback if we should to go down that route, though.

1c3t3a avatar Sep 29 '25 12:09 1c3t3a

I'm not a fan of this approach because if the application handles Exif like a black box and forwards it to the output (like most image conversion programs would do, including my own wondermagick) then this approach would embed the entire TIFF image into the converted image as metadata, resulting in extreme bloat.

Shnatsel avatar Oct 30 '25 17:10 Shnatsel

then this approach would embed the entire TIFF image into the converted image as metadata, resulting in extreme bloat.

Thanks for the feedback, that makes sense! I think the alternative for sure results in less pixel data, but is also annoying for the image crate:

  1. We can read all IFD tags that are not StripOffsets or TileOffsets.
  2. We reencode them to Exif using e.g. kamadak-exif.
  3. We return those bytes.

This will result in smaller exif blocks for sure, but currently has the problem that we cannot really write exif blocks easily in Rust. The only thing I found is the experimental Writer in kamadak-exif: https://docs.rs/kamadak-exif/0.6.1/exif/experimental/struct.Writer.html.

1c3t3a avatar Oct 31 '25 06:10 1c3t3a

Writing exif is the same as writing tiff IFDs with a different but overlapping set of keys (u16) used in the kv-map. Several documents gives us an almost exhaustive list of tags and their values which should be used for Exif, GPS, camera specific, private tags. In regards other than missing well-defined expected types all of these behave the same as the tiff structure itself.

We have a choice here:

  1. For comprehensive support: Then we need to offer the complete file. Unfortunately, some rather ill-conceived vendor extensions used an obscured IFD pointer, that is byte data that denotes an offset but that we do not rewrite into its new offset if we copy it over.
  2. For a correct lower-bound support: We can iterate the IFD structure with the methods tiff offers and copy them over into a new file. Now, tiff::encoder::TiffEncoder is still a bit rough and does not expose all the methods that you'd want, notably missing a method of adding a raw Entry structure with another backing Read / IfdDecoder<'_> for the data. It's also complex to add a subdirectory. So kamadak-exif might be the most convenient for now.
  3. Maybe I'm overlooking a middle ground, maybe my interpretation of vendor extensions isn't quite up to date? https://exiftool.org/TagNames/EXIF.html is somewhat detailed but not machine readable and includes explicit unknown items

I tend towards (2). Currently, in image we only read the first of a tiff's chain of images so that returning all the metadata is even misleading. I also concur that the overhead is just too high for most expected uses. If we returned a more complex Exif type, not Vec<u8>, then maybe it would be more justified but this is a pitfall if you're handling images without looking at their format.

197g avatar Oct 31 '25 15:10 197g

Yeah I agree and thanks for the summary @197g! Then let me take a stab at (2). Would it be fine to add a dependency on kamadak-exif and its experimental Writer for now until the TiffEncoder is ready?

1c3t3a avatar Nov 05 '25 04:11 1c3t3a

Definitely for sketch purposes and it is lightweight enough. The unsafe dependency needs some muster, I wonder what it is for. But should not hold up the implementation in principle as it does not leak immediately to the API as far as I can tell.

197g avatar Nov 05 '25 06:11 197g