Suggestions: New ImageDecoder and ImageSequenceDecoder APIs
It seems that there are currently multiple discussions and ideas for the new decoder API for 1.0. So I would like to propose a new API myself.
I created this API with the following goals:
- Traits are dyn-compatible. (Currently, neither
ImageDecodernorAnimationDecoderare dyn-compatible.) - Plugin decoder can be fully utilized. (Currently, there is no way to use plugins for decoding animations.)
- Don't force any decision onto the decoder impl. (Currently, the
AnimationDecodertrait forces all frames to be allocated and decoded as RGBA8.)
/// Settings passed to decoders on creation
pub struct DecoderConfig {
limits: crate::Limits, // ... and maybe more in the future
}
pub trait ImageDecoder {
/// Returns the layout of this image, as read by [`ImageDecoder::read_image`].
fn layout(&self) -> ImageLayout;
/// Returns the image into the given buffer.
///
/// If the image file contains multiple images/frames, the decoder will
/// read the one that best represents the image file. For animations, this
/// is usually the first frame.
fn read_image(&mut self, buf: &mut [u8]) -> ImageResult<()>;
/// If the image is a sequence of images/frames, return a decoder for the sequence.
///
/// If None is returned, the image is a single image or the decoder does not
/// support sequences decoding.
fn as_sequence_decoder(&mut self) -> Option<&mut dyn ImageSequenceDecoder> {
None
}
// ... metadata stuff
}
#[non_exhaustive]
pub enum SequenceKind {
/// A sequence of unknown or unspecified semantics.
Sequence,
/// An animation sequence.
Animation {
// ... animation info
looping: (), // no/yes how many times
frame_rate: (),
},
/// A volume represented by a sequence of 2D slices
Volume,
}
pub trait ImageSequenceDecoder {
/// The semantics of the sequence.
fn kind(&self) -> SequenceKind;
/// The total number of elements in the sequence
fn count(&self) -> u32;
// ... other sequence related info
/// Returns the layout of the next frame in the sequence, or None if there are no more frames.
fn read_next_frame_layout(&mut self) -> ImageResult<Option<ImageLayout>>;
fn read_next_frame(&mut self, buf: &[u8]) -> ImageResult<()>;
fn skip_next_frame(&mut self) -> ImageResult<()>;
}
I took some ideas from #2672 but solved the two-phase initialization it introduced. The basic usage of decoders is now as follows:
// define the configuration for the decoder (or just use DecoderConfig::default())
let mut config = DecoderConfig::default();
config.set_limit(...);
// create the decoder, which reads enough of the file to implement the ImageDecoder interface
let mut decoder = MyDecoder::new(reader, config)?;
// now we can ask it for dimensions, color, metadata an so on.
let (width, height) = decoder.layout().size();
// read the image file as a single image
let mut image = alloc_image(decoder.layout()); // allocate mem for the image
decoder.read_image(&mut image.buf)?;
Note that this solves the PNG limit problem by given all decoders the limits along with the reader. So decoder can't start reading without having limits. This also prevents the two-phase initialization #2672 proposes. I also want to say that I deliberately called it DecoderConfig, because I want the config to be open for extension. May want to support further configuration in the future, and DecoderConfig opens an easy path for it.
For sequences/animations, as_sequence_decoder can be used. This method allows users to read image sequences if the image file has multiple image.
let mut decoder = MyDecoder::new(reader, Default::default())?;
// now we ask it to treat the image as a sequence
let mut sequence_decoder = decoder.as_sequence_decoder().expect("not a sequence!");
dbg!(sequence_decoder.kind()); // could be SeqKind::Animation { looping: inf. frame_delay: 100ms }
dbg!(sequence_decoder.count()); // could be 12 for an animation with 12 frames
while let Some(frame_layout) = sequence_decoder.read_next_frame_layout()? {
let mut image = alloc_image(frame_layout); // allocate mem for the image
sequence_decoder.read_next_frame(&mut image.buf)?;
do_something_with(image);
}
This sequence API is a lot more powerful, should cover more use cases, and is compatible with plugins.
For simple use cases like getting an iterator like the AnimationDecoder used to, we can just provide a little helper struct that uses the new API:
pub struct FrameIterator<'a> {
decoder: &'a mut dyn ImageSequenceDecoder,
}
impl<'a> FrameIterator<'a> {
pub fn new(decoder: &'a mut dyn ImageSequenceDecoder) -> Self {
Self { decoder }
}
}
impl<'a> Iterator for FrameIterator<'a> {
type Item = ImageResult<DynamicImage>;
fn next(&mut self) -> Option<Self::Item> {
let layout = match self.decoder.read_next_frame_layout() {
Ok(Some(layout)) => layout,
Ok(None) => return None,
Err(e) => return Some(Err(e)),
};
let mut image = alloc_image(frame_layout);
self.decoder.read_next_frame(&mut image.buf)?;
Some(Ok(image))
}
}
Note: I don't really know a lot about animation formats, so I assumed that they have a constant frame rate in the sequence API. Is that assumption valid? I also removed the top-left offset the current Frame API has, because all of our decoders set it to 0,0. Are those necessary?
To be clear, the existing ImageDecoder trait is dyn-compatible. Otherwise it wouldn't be possible to use with a Box<dyn ImageDecoder>. The design has the downside that you can only call read_image if you have a Box or a concrete decoder implementation. But IMO that is preferable to having the API let you call read_image multiple times even though only the first call would produce the expected output
Note: I don't really know a lot about animation formats, so I assumed that they have a constant frame rate in the sequence API. Is that assumption valid? I also removed the top-left offset the current Frame API has, because all of our decoders set it to 0,0. Are those necessary?
No. Whatever assumption we have about animation formats they probably do not hold. That is even true for assumptions that I hold. At the very least when it becomes murky to distinguish an animation from a more general sequence with images of differing meaning, then every assumption about the relationship crumbles. We'll have to design the frames such that we can add new kinds of meaning later-on, incrementally.
The design has the downside that you can only call read_image if you have a Box or a concrete decoder implementation. But IMO that is preferable to having the API let you call read_image multiple times even though only the first call would produce the expected output
@fintelia I'm sympathetic towards this argument for the user-interface (i.e. ImageReader). But for the internal one I'm very skeptical, that was the main reason for reinterpreting the trait to be only the direction downstream->image. There's technical downsides to the consuming interface and I don't want to complexify them for a bit of ergonomics.
- Retrieving metadata in formats that store them after the image
- Switching to animation after an image (where that is the thumbnail frame)
- In
pngwe support async (by Chromium request) without a single line of async code, by allowing intermittentWouldBlockerror and a retry. That fundamentally can not work with a consuming interface. - The internal boxify is a crux, arguably explicitly unergonomic. That's fine if
imageeats complexity cost for the sake of other's ergonomics but the trait simply is used internally.
Not an immediate symptom of read_image but as an argument for treating ImageDecoder as a protocol that image uses to interact:
ImageReadercan effectively not pre-construct the decoder and we can not interact with it in any ways. That is a blocker for more efficient decoding where you decide at runtime if normalizing toRgbFrameis necessary or an image can be consumed in its more native format; such as a planar one.
I think it would be helpful to start by talking through what changes would be made to ImageReader and then we can figure out what would have to be different on ImageDecoder to enable those changes. Right now I'm having a hard time picturing how we'd replicate all the functionality that's available today via calling ImageReader::into_decoder and then directly interacting with the decoder.