Decoder metadata interface
See https://github.com/image-rs/image/issues/2245, the intended ImageDecoder changes.
This changes the ImageDecoder trait to fix some underlying issues. The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards the image library. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use of ImageReader and other inherent methods instead.
This is a work-in-progress, below motivates the changes and discusses open points.
ImageDecoder::next_layoutencourages decoders to read headers after the constructor. This fixes the inherent problem we had with communicating limits. The sequences for internal use is roughly:let mut decoder = …; decoder.set_limits(…); // Other global configuration we have? { // Potentially multiple times: let layout_info = decoder.next_layout()?; let mut buffer = allocate_for(&layout_info); decoder.read_image(&mut buffer)?; decoder.xmp_metadata()?; // and other meta } // … for sequences, start again from `init()`ImageDecoder::read_image(&mut self)no longer consumesself. We no longer need the additionalboxedmethod and its trait work around, the trait is now dyn-compatible.
Discussion
- [ ] Implement
next_layoutmore consistently- [ ] avif
- [ ] tga
- [ ] pnm
- [ ] tiff
- [ ] dxt
- [ ] qoi
- [ ] dds
- [ ] gif
- [x] Maybe ~~
init~~next_layoutshould return the full layout information in a single struct. We have a similar open issue forpngin its own crate, and the related work fortiffis in the pipeline where itsBufferLayoutPreferencealready exists to be extended with said information.- [ ] Review limits and remove its size bounds insofar as they can be checked against the communicated bounds in the metadata step by the
imageside. - [ ] Idea: If a decoder supports builtin transforms (e.g. YCbCr -> Rgb conversion, grayscale, thumbnailing) that are more efficient than post-processing then there could be a negotiation phase here where information is polled twice / multiple times by different methods. The design should leave this negative space to be added in
1.1, but it's not highly critical.
- [ ] Review limits and remove its size bounds insofar as they can be checked against the communicated bounds in the metadata step by the
- [ ] Fix the sequence decoder to use the new API
- [ ] Make sure that
read_imageis 'destructive' in all decoders, i.e. re-reading an image and reading an image beforeinitshould never access an incorrect part of the underlying stream but instead return an error. Affects pnm and qoi for instance where the read will interpret bytes based on the dimensions and color, which would be invalid before reading the header and only valid for one read. - [ ] Tests for reading an image with
read_imagethen switching to a sequence reader. But that is supposed to become mainly an adapter that implements the iterator protocol. - [ ] Remove remnants of the dyn-compatibility issue.
- [x] Adapt to the possibility of fetching metadata after the image. This includes changing
ImageReaderwith a new interface to return some of it. That may be better suited for a separate PR though.- [x] Extract the CICP part of the metadata as
CicpRgband apply it to a decodedDynamicImage. - [ ] Ensure that this is supported by all the bindings.
- [x] Extract the CICP part of the metadata as
- [ ] Deal with limits: https://github.com/image-rs/image/pull/2672#discussion_r2595580766
The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards the image library. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use of ImageReader and other inherent methods instead.
With this framing, I think Limits::max_image_width and Limits::max_image_height no longer need to be communicated to or handled by the ImageDecoder trait, because the external code can check ImageDecoder::dimensions() before invoking ImageDecoder::read_image(); only the memory limit (Limits::max_alloc) is essential. That being said, the current way Limits are handled by ImageDecoder isn't that awkward to implement, so to reduce migration costs keeping the current ImageDecoder::set_limits() API may be OK.
A couple thoughts...
I do like the idea of handling animation decoding with this same trait. To understand, are you thinking of "sequences" as being animations or also stuff like the multiple images stored in a TIFF file? Even just handling animation has some tricky cases though. For instance in PNG, the default image that you get if you treat the image as non-animated may be different from the first frame of the animation. We might need both a read_image and a read_frame method.
The addition of an init method doesn't seem like it gains us much. The tricky part of our current new+set_limits API is that you get to look at the image dimensions and total output size in bytes when deciding what decoding limits to set. Requiring init (and by extension set_limits) to be called before reading the dimensions makes it basically the same as just having a with_limits constructor.
Requiring init (and by extension set_limits) to be called before reading the dimensions makes it basically the same as just having a with_limits constructor.
It's a dyn-compatible way that achieves the goal of the constructor so it is actually an abstraction.
The tricky part of our current new+set_limits API is that you get to look at the image dimensions and total output size in bytes when deciding what decoding limits to set.
What do you by this? The main problem in png that I'm aware of is the lack of configured limits for reading the header in the ImageReader path, that motivated the extra constructor in the first place. With png we can not modify the limits after the fact but also we don't really perform any large size-dependent allocation within the decoder.
I'm also not suggesting that calling set_limits after the layout inspection would be disallowed but obviously is decoder dependent on whether that 'frees' additional capacity. I guess if that is sufficient remains to be seen? When we allocate a buffer (with applied allocator limits)´ that allows forwarding the remaining buffer size to the decoder. Or, set aside a different buffer allowance for metadata vs. image data. Whatever change is necessary in png just comes on top anyways, the init flow just allows us to abstract this and thus apply it with an existing Box<dyn ImageDecoder> so we don't have to do it all before. Indeed, as the comment on size eludes to we may want to different limit structs: one user facing that we use in ImageReader and one binding-facing that is passed to ImageDecoder::set_limits. Then settings just need to be
@fintelia This now includes the other changes including to ImageReader as a draft of what I meant in https://github.com/image-rs/image/issues/2679#issuecomment-3590938582. In short:
- The file guessing routines and the construction of the boxed decoder are split into a separate type,
ImageFile, which provides the previous methods ofImageReaderbut also aninto_readerfor the mutable, stateful interface. ImageReader:- features all accessors for metadata considering that some formats fill said metadata (or a pointer to it) after an image is decoded.
- has
viewboxas a re-imagining of the previousImageDecoderRecttrait but split into two responsibilites: the trait does the efficiency decision on an image-by-image basis with an interface that allows a partial application of the viewbox (in jpeg and tiff we would decode whole tiles); then the reader takes care of translating that into an exact layout. Note that another type of image buffer with offset+rowpitch information could do that adjustment zerocopy—I still want to get those benefits of the type erased buffer/image-canvassomeday and this fits in.
- The code also retrieves the CICP from the color profile and annotates the
DynamicImagewith it where available. For sanity's sake themoxcmsintegration was rewritten to allow a smaller dependency to be used here, I'll split these off the PR if we decide to go that route. - Conceivably there's a
gain_map(or similar) that may be queried similar to the metadata methods. For that to be more ergonomic I'd like to seriously considerread_planefor, in tiff lingo, planar images as well as associated and non-associated mask data; and more speculatively other extra samples that are bump maps? uv? true cmyk?. While that does not necessarily all go into1.*for any output that is not quite neatly statically sorted and sized as anRgba4-channel-homogeneous-host-order, I imagine it will be much simpler for a decoder to provides its data successively in multiple calls instead of a contiguous large byte slice. Similar toviewboxwe'd allow this whereImageReaderprovides the compatibility to re-layout the image for the actual user—except where explicitly instructed. AdjustingImageReader::decodeto that effect should be no problem in principle.
I can't speak about image metadata, but I really don't like the new ImageDecoder interface as both an implementor of the interface and a potential user of it. Right now, it's just not clear to me at all how decoders should behave. My problems are:
init. This is just two-phase initialization and opens so many questions.- Do users have to call it? The docs say "should" not "must" be called before
read_image. - Are users allowed to call it multiple times? If so, the decoder has to keep track of whether the header has already been read.
- Since
initreturns a layout, what's the point ofdimensions()andcolor_type()? And what if they disagree? - What should
dimensionsand co do beforeinitis called? - If
initfails, what should happen to methods likedimensionsandread_image? When called, should they panic, return an error, return default values? - After calling
read_image, do you have to re-initbefore callingread_imageagain?
- Do users have to call it? The docs say "should" not "must" be called before
viewboxmakes it more difficult to implement decoders.- Now they always have to internally keep track of the viewbox rect if rect decoding is supported.
- After calling
viewbox, what shoulddimensionsbe? If they should be the viewbox size, should they reflect the new viewbox even before callinginit? - It's not clear what should happen if
viewboxreturns ok, butiniterrors. - What should happen if users supply a viewbox outside the bounds of the image?
- When calling
viewbox, is the offset of the rect relative to the (0,0) of the full image or the last set viewbox? - What should happen if
read_imageis called twice? Should it read the same image again, error, read the next image in the sequence? The docs don't say.- If the intended behavior is "read the same image again", then those semantics would force all decoders to require
Seekfor the reader (or keep an in memory copy of the image for subsequent reads). Not an unreasonable requirement, but it should be explicitly documented.
- If the intended behavior is "read the same image again", then those semantics would force all decoders to require
Regarding rectangle decoding, I think it would be better if we force decoders to support arbitrary rects. That's because the current interface is actually less efficient by allowing decoder to support only certain rects. To read a specific rect that is not supported as is, ImageReader has to read a too-large rect and then crop the read image, allocating the memory for the too-large image only to throw it away. It is forced to do this, because of the API.
However, most image formats are based on lines of block (macro pixels). So we can do a trick. Decode a line according to the too-large rect, and then only copy the pixels in the real rect to the output buffer. This reduces the memory overhead for unsupported rects from O(width*height) to O(width*block_height). Supported rects don't need this dance and can decode into the output buffer directly. I.e. that's kinda what DDS does.
And if a format can't do the line-based trick for unsupported rects, then decoders should just allocate a temp buffer for the too-large rect and then crop (=copy what is needed). This is still just as efficient as the best ImageReader can do.
For use cases where users can use rowpitch to ignore the exccess parts of the too-large rect, we could just have a method that gives back a preferred rect, which can be decoded very efficiently.
So the API could look like this:
trait ImageDecoder {
// ...
/// Returns a viewbox that contains all pixels of the given rect but can potentially be decoded more efficiently.
/// If rect decoding is not supported or no more-efficient rect exists, the given rect is returned as is.
fn preferred_viewbox(&self, viewbox: Rect) -> Rect {
viewbox // default impl
}
fn read_image_rect(&mut self, buf, viewbox) -> ImageResult {
Err(ImageError::Decoding(Decoding::RectDecodingNotSupported)) // or similar
}
This API should make rect decoding easier to use, easier to implement, and allow for more efficient implementations.
- init. This is just two-phase initialization and opens so many questions.
That was one of the open questions, the argument you're presenting makes it clear it should return the layout and that's it. Renamed to next_layout accordingly. I'd like to remove the existing dimensions()/color_type methods from the trait as well. There's no point using separate method calls for communicating them.
-
For use cases where users can use rowpitch, […]
That is ultimately the crux of the problem. I'd say it's pretty much the only problem even though that does not appreciate the complexity. A lot of what you put forth is overly specific to solving one instance of it, obviously focusing on DDS. That's not bad but take a step back to the larger picture. There's no good way to communicate all kinds of layouts that the caller could handle: tiled, planar, depths, sample types …. With the information being exchanged right now, no-one can find a best-match between the requirements of
image's data types (andLimits) and what the decoder can provide. This won't be solved by moving complexity into the decoders, we need to get structured information out of them primarily, then make that decision / handling the resulting byte data inimage's code. -
- viewbox makes it more difficult to implement decoders.
The point of the default implementation in this PR is that it is purely opt-in. Don't implement the method for decoders that can not provide viewbox decoding and everything works correctly. The documentation seems to be confusing, point taken. We're always going to have inefficiencies, I'm for working through the distinct alternative layouts that allow an optimization one-by-one. More importantly for this PR immediately is what outcome a caller may want and what interface would give it to them—in this case I've worked on the use-case of extracting part of an atlas.
-
However, most image formats are based on lines of block (macro pixels). So we can do a trick.
I'm not designing anything in this interface around a singular "trick", that's the wrong way around. That is how we got here. That's precisely what created
ImageDecoderRect, almost to the dot. Falsehoods programmer's assume about image decoding will lead that to this breaking down and to be horrible to maintain. The trick you mention should live in the decoder's trait impl and nowhere else and we can bring it back where appropriate and possible. (Note that if you do it for a specific format, some formats will be even more efficient and not require you decode anything line-by-line but skip ahead, do tiles, … That's just to drive home the point that you do not want to do this above the decoder abstraction but below it in theImageDecoderimpl). -
It is forced to do this, because of the API.
The decoder impls is only forced to do anything if we force it via an interface—this PR does not;
read_image_rect(&mut self, buf, viewbox)does force a decoder to be able to handle all possible viewboxes—this PR does not. I'm definitely taking worse short-term efficiency over code maintenance problems—the latter won't get us efficiency in the long run either.
When calling viewbox, is the offset of the rect relative to the (0,0) of the full image or the last set viewbox?
It's suppose to be to the full image. Yeah, that needs more documentation and pointers to the proper implementation.
I'd like to remove the existing dimensions()/color_type methods from the trait as well. There's no point using separate method calls for communicating them.
:+1: I'd even go so far as to say that having two methods are supposed to return the same data is just waiting for bugs to happen.
I think you misunderstood some aspects of what I was suggesting regarding viewbox. So let me quicky clear up some parts:
It is forced to do this, because of the API.
The decoder impls is only forced to do anything [...]
"It" was refering to ImageReader. So the user of the decoder, not the decoder impl.
My point is that any user with an unsupported rect is forced to allocate a bigger image (fitting the larger supported rect) and then allocate another image to crop. (IF rowpitch is not available. If rowpitch is available, none of this is a problem.)
However, this limitation does not apply to decoder impls. They can potentially be smarter about implementing arbitrary rects. Which leads me to:
However, most image formats are based on lines of block (macro pixels). So we can do a trick.
I'm not designing anything in this interface around a singular "trick", that's the wrong way around.
Re-reading my comment, this section definetly wasn't clear at all and I apologize for that. I never meant that suggest that all formats should use this or similar tricks. This was meant as an example of what decoder impls can potentially do to implement rect decoding more efficiently than users of decoders (e.g. ImageReader) could.
What I actually meant to say by "force decoders to support arbitrary rects" is that decoder impls should handle unsupported rects internally.
Example: Let's say the JPEG decoder only supports rect decoding along 8x8 block boundaries (ignoring subsampling). Then image could provide a util function that handles allocating the larger image buffer and copying to the output buffer:
fn read_image_rect(&mut self, buf: &mut [u8], viewbox: Rect) -> Result {
let supported_rect: Rect = self.get_supported_view(viewbox); // align to 8x8 boundaries
image::util::crop_rect(
buf,
viewbox,
supported_rect,
&mut self.limits,
|larger_image_buf: &mut [u8]| {
// decode `supported_rect` into `larger_image_buf`
// Note: if viewbox == supported_rect, this function is called with buf directly.
}
)
}
This is no less efficient than users like ImageReader doing this, but it does allow for improvements! Say the JPEG decoder allows decoding the image in block-aligned lines. Then we can optimize the implementation with another util function:
fn read_image_rect(&mut self, buf: &mut [u8], viewbox: Rect) -> Result {
let supported_rect: Rect = self.get_supported_view(viewbox); // align to 8x8 boundaries
let line_height: u32 = 8;
image::util::crop_rect_lines(
buf,
viewbox,
supported_rect,
line_height,
&mut self.limits,
|line_buf: &mut [u8], line_rect: Rect| {
// decode `line_rect` into `line_buf`
// Note: This will decode K lines. (line_rect.height = K * line_height)
// Note: if viewbox == supported_rect, this function is called with buf directly.
}
)
}
Now the implementation only needs additional memory for a single line. That's a lot more efficient for non-rowpitch users (e.g. ImageReader).
Of course, even this small memory overhead is unnecessary for rowptich user. In that case, the rowpitch user can extend their viewbox themselves using decoder.preferred_viewbox, freeing them from unnecessary overhead.
But now consider the same example of JPEG decoding line by line under the current API. What should Decoder::viewbox return?
- If
viewbox(rect)returns Ok (=allowing all rects as is), then users that can't use rowpitch (e.g.ImageReader) are happy, because they don't need to allocate a temporary image. However, rowpitch users now have to eat the unnecessary overhead of the line buffer. - If
viewbox(rect)returns the actually supported rect (aligned to 8x8 boundaries), then rowptich users are happy because they don't have any overhead. However users that can't use rowpitch (e.g.ImageReader) are forced to allocate a temporary image.
The current API forces the example JPEG decoder to make a tradeoff that prefers either rowpitch users or non-rowpitch users. In general, any decoder that has some cost for supporting arbitrary rects has to make this tradeoff (if that cost is less than allocating the entire larger image for the supported rect).
That's why I suggested the API that I did.
But now consider the same example of JPEG decoding line by line under the current API. What should Decoder::viewbox return?
A viewbox relative to the surrounding 8x8 aligned grid. And change its configuration such that the next_layout reflects that, i.e. has the size of that surrounding grid. Then we only allocate and pass the smaller buffer in read_image. I hope that also clarifies the intent of next_layout, you're suppose to be able to call it multiple times without advancing the image, calling other re-configuration methods in between can change it and at the end the ImageReader retrieves from it a fully configured layout for the call to read_image following.
Example: underlying frame has size 34x34, we're requesting viewbox 12x12+10+8. So jpeg returns 12x12+2+0 and changes the layout to a 16x16 image. It's a little bit redundant of course. In the error case we only really need to know the relative offset but we have no type for that here. I'd considered if it should return 16x16+2+2 (the adjusted dimensions with the respective offset) but that's just more confusing mixing data like that and if we're doing rounds of configuration the layout needs to be polled again anyways.
My point is that any user with an unsupported rect is forced to allocate a bigger image (fitting the larger supported rect) and then allocate another image to crop.
The memmove needed to crop an image in-place is fast. Really fast. 60Gbps fast. (At least it should be). I really don't worry about this in the grad scheme. The necessary optimization is to avoid allocating and pushing data into the buffer, to within a constant factor of the requested viewport. And I would not worry about allocation too much since the biggest optimization for that, reusing an existing buffer, should have priority.
Forcing decoders to handle arbitrary smaller windows is 'if people would just …'. Being forced to store to an output buffer that is too big is an inconvenience, a buffer that is too small forces a whole separate allocation inside the decoder in the, as per experience with DecodeImageRect, common case. One of the first extensions to ImageLayout should be for a decoder to request more buffer space. I don't think your read_image_rect is implementable without a temporary. What's the point? In effect it has costs of what ImageReader is doing in this PR with a weirder double callback-callback API that will make specialization hell. Do one allocation, at the outer-most callsite. Let the decoder decide on what it can do. If that is too little we'll provide more tools in future extensions (as more info to image through the #[non_exhaustive] struct ImageLayout, or more choices by adding methods in lieu of viewbox) . It's a good path to incremental improvements, I believe.
The
memmoveneeded to crop an image in-place is fast. Really fast. 60Gbps fast. (At least it should be). I really don't worry about this in the grad scheme.
Ah, I didn't consider in-place cropping at all. You're right. In that case, reading a too-large rect shouldn't be an issue. Meanwhile, my suggested API would force another allocation on top of the cropping. Your design is definitely better.
Sorry for taking so long to understand that 😓
I hope that also clarifies the intent of
next_layout, you're suppose to be able to call it multiple times without advancing the image, calling other re-configuration methods in between can change it and at the end theImageReaderretrieves from it a fully configured layout for the call toread_imagefollowing.
Okay, so read_image does the advancing. This answers most of my questions. Only what error is returned after the iterator ended remains.
So next_layout doesn't change the (iterator) state the decoder is in. Yes, it may cause the decoder to initialize and read necessary data from disk, but that's just to make the necessary info available. In that sense, initialization and IO are just a side effect of returning the layout.
In that case, may I suggest renaming next_layout to just layout? A method signature like next_something(&mut self) -> T really makes it seem like it advances something. I think this name would also make it more obvious that the method is only &mut to allow for initialization and necessary IO, instead of changing what is read or how it is read.
Resolving the naming question as peek_layout, hopefully satisfactory for now.
@fintelia I understand this is too big for a code-depth review but I'd be interested in the directional input. Is the merging of 'animations' and simple images as well as the optimization hint methods convincing enough? Is the idea of returning data from read_image something that works for you? The struct is meant to be Default-able and fills in the information for into_frames() but I'll sketch out some way of putting the metadata indicators in there (i.e. should you poll xmp for this frame, or wait until the end, or is it constant of the file).
As an aside, in wondermagick we basically find that sequence encoding is a missing API to match imagemagick. We can currently only do this with gif, despite tiff and webp having absolutely no conceptual problems with it (avif too, but imagemagick behaves odd, does not match libavifs decoding and the rust libraries don't provide it). It would be nice to make those traits symmetric so the direction here influences the encoding, too.