image icon indicating copy to clipboard operation
image copied to clipboard

Rename ImageBuffer to PixelBuffer

Open 197g opened this issue 6 months ago • 17 comments

Let's get started on the larger API changes for better color treatment etc.

As discussed previously in relation to the Pixel trait, we could free up terminology for non-Pixel representations and in particular for an internal type that also has associated metadata. These pixel buffer are good for simplified representations, constructors, and static typed code paths but not a unified representation of generic real-world complexity.

See: https://github.com/image-rs/image/issues/1523#issuecomment-2156194262 and following comments. @kornelski You were most spoken about removing strong typing but as far as I read into it, not against having some untyped representations. I'd just like to clarify with you.

197g avatar Jun 06 '25 13:06 197g

I'm strongly in favour of using types for memory layout, but not necessarily for the semantics of the content.

&[u8] is cumbersome and error prone if you have to do arithmetic to get the right channel, and quite annoying when working with 16-bit channels.

However, I wouldn't use static typing for things like color gamut, at least not in general case.

kornelski avatar Jun 06 '25 15:06 kornelski

The pair of Pixel and PixelBuffer does make sense to me, so I'm potentially open to this renaming. But I'd want to understand more about what a non-Pixel image representation would look like to say whether the overall change makes sense.

fintelia avatar Jun 07 '25 00:06 fintelia

The two largest classes of non-Pixel buffers:

  • Planar layouts, potentially subsampled as Yuv, layouts where each color unit is not a contiguous part of memory. (#2010 , #1272, #1866) Separate alpha planes should be sufficient motivation with image editing in mind, both for removing and adding one.
  • Paletted buffers and other forms where additional out-of-band data is required for colorimetric interpretation. This class may include gainmap augmented content depending on how strict you want to interpret the intent of modifying only one part of this pair. Editing gainmaps would be pretty sweet, for those targetting non-ancient technology.

Other smaller classes would be using DDS / DXT texels—or other similar compressed encodings—when memory bandwidth is a bottleneck; rgb565 because a wrapper over u16 can't be disassembled into a slice of components (just removing that capability from the Pixel trait is a moot design path that just wouldn't allow keeping this simple constructor but has no concrete upside); etc.

197g avatar Jun 07 '25 00:06 197g

The important thing for me is that whatever our main image representations are, they should integrate with the rest of the functionality we provide. Things like calling image::open (or similar) to load an image into this representation, having methods to rotate/resize/crop it, a way to read individual pixel component values, and being able to save it back to a file.

Our current pair of DynamicImage and ImageBuffer mostly handle this well. DynamicImage supports all image kinds that we support loading and can be viewed as or converted to ImageBuffer to access individual pixel values. The main issues I've seen come up with them are because they don't carry info about the color gamut or transfer function.

fintelia avatar Jun 07 '25 04:06 fintelia

Sure, compatibility with DynamicImage/open/save are the main design constraints for a metadata/color-aware buffer type. I have a good deal of potential solutions that hopefully fit together, you'll see in the reviews on those PR(s).

197g avatar Jun 07 '25 07:06 197g

In my experience, there are different levels of an "image":

  1. The whole thing you get when decoding a file with metadata, so it may contain color profiles, multiple animation frames, layers, special channels like depth, and text attachments in a variety of formats. Struct with private fields.

    This representation is needed to round-trip a file, or construct a new file with enough fidelity. However, it doesn't make sense to edit such file directly. Having every image transform function worry about merging and preserving text metadata doesn't make sense. Having every subslice of pixels worry about everything about ICC profiles, CMYK, palettes, separate alpha plane, etc. is too cumbersome.

  2. Equivalent of a single frame of animation. No textual metadata, but has full-fidelity color information. Struct with private fields. This is all you need to display an image correctly.

    I don't edit these directly. The color metadata is still inconvenient to carry around. I want to be able split an image into tiles to operate on each in a separate thread, but color profiles can be large to copy, and ICC color transforms may need a cache, so subslices of such image have a dilemma of copying vs locking vs !Send. In my implementation, I support decoding at 1/nth of resolution, so at this level my images have a separate notion of a logical size (the full size it would have had) and actual decoded bitmap size.

  3. RGB/RGBA/Gray 8/16bit bitmap, which doesn't carry color metadata with it. It's just a bunch of pixels. Enum with everything public. Has owned and borrowed flavors, or Cow variants.

    At this point it's your problem to convert images to your working color space, and use that color space correctly. The upside is that the bitmap is easy to tear apart and reconstruct when needed. It has to be an enum, so that you can keep decoded pixels as gray or opaque for speed and memory efficiency, but still be able to apply color and alpha transformations to them when needed.

  4. Bitmap with pixels in a type-safe statically known pixel format (like RGBA), no color metadata. Has owned and borrowed flavors.

    These can be sliced per row to basic Rust slices, and the per-row operations can autovectorize. For the borrowed one, it's very useful to have a stride, since it allows cropping/tiling without copying. OTOH a contiguous row-major bitmap is often needed to interoperate with other libraries, so the owned one doesn't need a stride.

It's really hard to skip any level in this hierarchy. If you don't have the 1st one, then your encoder and decoder have to do that job (with an <R: Read> following it everywhere), and users need to know how to ask the decoder for each bit of relevant information. If you don't have the 2nd one, then you can't correctly work with anything other than sRGB. If you have 2nd, but not the 3rd, then internal helper functions need to care too much about color spaces (e.g. flipping pixels upside-down shouldn't require you to check if the color profile has changed). Without the 4th everything is a match, and/or you work with C-style pixel_pointer + bytes_per_channel and if is_alpha { pointer += 1 }.

kornelski avatar Jun 07 '25 11:06 kornelski

Cool. I think we're on the same page. This covers quite exactly the types I want to go for:

  1. The new ImageBuffer to be provided with the metadata.rs information.
  2. The type as by the underlying image-canvas; which already does all three of shared, !Send, Send except I'm stuck on whether color conversion luts could also be snapshot and shared as Arc<_> and if so where they should be stored as an attribute (seems like you're unsure too, we can just give this issue time, no rush). To be exposed eventually but since it is hard design space and we only pay with some performance if we operate on (1) for now let's do that last.
  3. DynamicImage/PixelBuffer; where I also imagine we could have such strongly typed but uncolored wrappers for other layouts that aren't 'Pixel'. The design space imo is any collection of color representations, all within one color model, that is feasible to convert any other colorimetric layout¹ into and from by re-assembling channels, undoing subsampling, rescaling, i.e. normalizing the component type layout, without having to think about color space in the conversion operation.
  4. What is provided by flat, which I'd like to rename to view or something similar. You might find interesting: image-texel/image/data which is too generic for image's sake but targets the level of compatibility with outside binaries that I'd like.

One remaining decision is how interfaces written in terms of (3) should accept color information to upgrade the buffers to (2) or to operate on the data as (2) temporarily. We can only model that by traits by necessarily having each repr in (3) have some conventional assumed color space. That's fine, it should just be explicitly decided. For the 'average' consumer it may even be expected that 'just encode this DynamicImage' defaults to assuming they mean sRGB for instance.

¹I realize this is weasel worded, it'll be a partial function in the general case even though the other part of the sentence requires us to make it a function that covers most possible layouts from (2). It'll be a design problem to balance this.

197g avatar Jun 07 '25 13:06 197g

In my implementation, I support decoding at 1/nth of resolution, so at this level my images have a separate notion of a logical size (the full size it would have had) and actual decoded bitmap size.

This is a crucial need that we haven't covered yet in other discussions. Thank you, it makes a lot of sense with regards to jpeg, other DCT approaches, and probably all forms of compression.

197g avatar Jun 07 '25 13:06 197g

Would most users start using the new type in all the places they currently use ImageBuffer? Or in most cases would they have to change their code to use PixelBuffer instead? I'd rather avoid churn if we could prevent it by picking a different name for the new type instead.

@kornelski's levels of images makes a lot of sense to me. A few thoughts on the different levels:

  1. The closest thing we have is ImageDecoder. We could look at other designs, but I suspect our approach of having a trait that abstracts over different concrete implementations for each image format may work better than trying to have a single type that can represent the full complexity of all image formats. There's also the question of lazily (via io::Read) or eagerly loading image metadata which carries its own tradeoffs.
  2. This is the biggest difference in our current design. We don't have a type that carries color metadata alongside image pixels.
  3. Sounds a lot like DynamicImage. We support gray+alpha and some floating point formats, and can't do borrowed buffers. But the overall structure sounds very similar.
  4. Describes ImageBuffer today. It supports both being backed by a Vec<T> and (although we don't highlight it much) also can be backed by a slice. Adding a stride field might be good, though I'd want to check whether that would risk silently breaking downstream code.

One thing I've been wondering is whether we should define a small number of supported working colorspaces, convert into one of them when loading an image, and then have DynamicImage / ImageBuffer track which colorspace they are in. Ideally we could do it in a way that most users wouldn't have to pay attention to them and things would just work. Similar to how today users can write applications that properly handle YUV and/or planar encoded images without even knowing what those are.

fintelia avatar Jun 07 '25 18:06 fintelia

I'm least sure about my design of the 3rd case, the DynamicImage-like enum. This is because there's a tension between being able to accurately represent any useful pixel format & layout vs forcing every downstream user to match against all the weird formats and probably throw unimplemented!() on half of them.

I can handle interleaved Luma/Rgb/Rgba without much code duplication using a bit of generics or macros, but throwing a planar YUV into the enum starts requiring a separate code path, especially when it has extra requirements like having a multiple-of-N width where some pixels are supposed to be ignored. Or I can support any flavor of linear-light RGB, but there's no linear-light YUV or CMYK.

It would be cool if Rust had some form of subclassing/subsetting for enums, so that if you just decoded a PNG, you don't need to match on Yuv, Cmyk, etc.

Maybe there should be multiple DynamicImage-like enums, and users would ask for a subset they want?

Or maybe some other method of negotiating the format based on an open set of types? Users would bring their own enum with them:

match mystery_image.what_are_you() {
    (HasAlpha, Depth8, sRGB) => MyImage::Rgba8(mystery_image.convert_to::<Rgba<u8>>(Profile::sRGB)),
    (_,        Depth8, sRGB) => MyImage::Rgb8(mystery_image.convert_to::<Rgb<u8>>(Profile::sRGB)),
    (_,        _,      sRGB) => MyImage::Rgba16(mystery_image.convert_to::<Rgba<u16>>(Profile::sRGB)),
    _ => MyImage::WideGamut(mystery_image.convert_to::<Rgba<u16>>(Profile::Rec2020)),
}

kornelski avatar Jun 07 '25 22:06 kornelski

I think that our ability to convert to a smaller supported set of image representations is a very powerful way we can (and currently do) cut down complexity. For the cost of a single SIMD accelerated pass over the image data, we can drastically reduce the number of potential variations.

We currently do the conversion immediately upon image loading, which lets all our internal code rely on the limited set of formats. If the conversion happened later and/or was optional, then internal code would have to handle way more cases. (I'd really like to avoid something like resize raising an unimplemented!() if we can avoid it!)

fintelia avatar Jun 08 '25 00:06 fintelia

I agree with @kornelski that exposing YUV, CMYK and other color models in the same paths is cumbersome. This will require matching AwesomeTypes ( see below ) what makes life hard to handle all of these at the same time.

What is motivation to expose YUV in the same path? YUV is transmission color model and you usually don't want to do anything directly on it except a few operations: convert it to RGB, rotate/mirror, rescale, upload to GPU that has YUV sampler.

Since ICC profiles involved, there’s an additional complication: ICC profiles also could represent YUV and those profiles actually exist. Thus, consumer can convert RGB to YUV using ICC profile and store YUV image with ICC in AVIF as YUV once again. What YUV should be actually be exposed then?

The same applies to CMYK. You usually don't want to do anything on it directly.

One thing I've been wondering is whether we should define a small number of supported working colorspaces, convert into one of them when loading an image, and then have DynamicImage / ImageBuffer track which colorspace they are in. Ideally we could do it in a way that most users wouldn't have to pay attention to them and things would just work. Similar to how today users can write applications that properly handle YUV and/or planar encoded images without even knowing what those are.

I like this idea. However, it require some very obvious options how to turn it off, or tracking original image alongside the converted one. It generally works well when you convert a smaller gamut into a large one when gamuts fully overlap. But in cases when gamuts don't overlap, or converting larger gamut into small one it washes out colors. We could always convert to something huge as ACES2065-1/AP0, but this color space is unusual for web images.

AwesomeTypes
enum AwesomeTypes {
    Rgba8,
    Rgb8,
    Gray8,
    Gray16,
    GrayAlpha8,
    GrayAlpha16,
    RgbaF32,
    RgbF32,
    Rgba16,
    Rgb16,
    YCbCr8,
    YCbCr10,
    YCbCr12,
    YCbCr16,
    YCgCo8,
    YCgCo10,
    YCgCo12,
    YCgCo16,
    YCgCoRo8,
    YCgCoRo10,
    YCgCoRo12,
    YCgCoRe8,
    YCgCoRe10,
    YCgCoRe12,
    ICtCp8,
    ICtCp10,
    ICtCp12,
    ICtCp16,
    Cmyk8,
    Cmyk10,
    Cmyk12,
    Cmyk16,
}

awxkee avatar Jun 08 '25 08:06 awxkee

This is because there's a tension between being able to accurately represent any useful pixel format & layout vs forcing every downstream user to match against all the weird formats and probably throw unimplemented!() on half of them.

This will require matching AwesomeTypes ( see below ) what makes life hard to handle all of these at the same time.

I tried to fit the idea into unduly brief wording and wasn't very clear apparently. In fact, I don't intend to expand DynamicImage as it is today due to simplicity of handling. That type already contains one color model, i.e. rgb-ish types with common sample depths. Rather than extend it and lose that property, by then having several color models, I'd could make other enumerations for other color model variants. So one DynamicCmykImage for instance, in the order of complexity we can make them implementable. Surely better enums would be amazing¹ but we can do multiple types just as well, as long as each enum individually can be converted from/to an image of type (1)/(2) which will then be the connecting color representation so to speak, that never needs to be inspected but is there in the background simplifying our implementation from an N·M to N+M conversion.


¹GADTs with compiler inferred uninhabitedness:

enum AllImages<T: Model> {
    Rgb8(…) where <T as Model>::RGB,
    Rgb16(…) where <T as Model>::RGB,
    Cmyk(…) where <T as Model>::CMYK,
    YUV440(…) where <T as Model>::YUV,
    YUV420(…) where <T as Model>::YUV,
}

type DynamicImage = AllImages<RGBish>;

But enough dreaming :slightly_smiling_face:

197g avatar Jun 08 '25 10:06 197g

If the conversion happened later and/or was optional, then internal code would have to handle way more cases. (I'd really like to avoid something like resize raising an unimplemented!() if we can avoid it!)

Not really, we still have our code operate on DynamicImage. So long as we don't implement GenericImage for ImageBuffer/(1)/(2) we can delay the conversion just fine. The users will have to convert into DynamicImage to call the functions which will be just as complex as they are right now. And then we can think about relaxing the bound function by function, or actually since ImageBuffer is one type without generics we could add methods instead of supporting the free functions where that is appropriate.

197g avatar Jun 08 '25 10:06 197g

Getting YUV directly can be useful for transcoding from JPEG to other formats, or just lowering quality of the JPEG. Going through RGB gives you extra rounding and clipping.

kornelski avatar Jun 08 '25 20:06 kornelski

I think now we're veering off this PR. The question that imo needs to be answered with the concrete buffer purposes that we agree on: is this buffer for purpose (3) and do we want to free up the naming prefix Image* for (1) and (2). What constraints we have for (1) and (2) can be discussed in the other threads, I'm sure we'll find an implementation.

197g avatar Jun 09 '25 08:06 197g

I don't think we should add a new type named ImageBuffer that isn't for (4) or isn't largely interchangeable for current code that uses ImageBuffer. I'd rather start by adding new types for (1) and/or (2) with prefixes of Image* and then based on a view of the full API, decide whether the ImageBuffer -> PixelBuffer switch would be worth the churn.

fintelia avatar Jun 09 '25 14:06 fintelia