image icon indicating copy to clipboard operation
image copied to clipboard

2d/3d array based interfaces (request/discussion)

Open experiment9123 opened this issue 3 years ago • 13 comments

With 2 reference points - some simple jpg loading libs in C++, and the experience of python image libs (and interfacing to AI libraries), and the recent addition of "const generics" - is it possible to have a 3rd interface type somewhere between the ImageBuffer and DynamicImage , handling some of the cases through the concept of a channel count 1,2, 3,4 channels = Luma, LumaAlpha, RGB, RGBA by default.

option 1 - 'const generics' - a 2D array of const N components'

  • could the existing ImageBuffer type even be generalised to this , e.g.
    • struct ImageBufferN<ChannelType, const N:NumChannels, Container>{ width,height, container }
    • struct PixelN<T,const N:usize>([T;N])
    • type Rgba<T> = PixelN<T,4> ; type Rgb<T> = PixelN<T,3>; type Luma<T> = PixelN<T,1>; type LumaA<T>=PixelN<T,2>
    • later(defaulted) parameters could handle reprenting other layouts when rust has const Enum params.
      • equivalent to existing struct Rgba, struct Rgb, etc`
      • could these renames even happen as a non-breaking change?
  • or an alternate API & type be built on a general-purpose 2D array:
    • type RgbaImageArray<T> = Array2D<PixelN<T,4>>
    • type RgbImageArray<T> = Array2D<PixelN<T,3>>
    • type LumaImageArray<T> = Array2D<PixelN<T,1>>

option 2 - '3d array' - width x height x channels manipulated through a general purpose array interface.

  • could a 3d array be aproximated in a 'type neutral' way through (Vec<T>,(width,height,channels)) , i.e 'buffer+shape_tuple'

  • or could the image lib bring in the most popular rust ecosystem 3d array type ?

  • the operation of channel interleaving/un-interleaving can be expressed as a swap of width & depth

  • 3D (or 4D with channels) arrays lead naturally into ways of handling volume textures , openGL texture arrays, approximating photoshop layers (and in turn possibly interfaces to AI libraries) ... eg array operations for concatenating /spltting Vec< Array3d > -> Array4d etc

  • together with an enum ChannelNames{Luma, LumaAlpha, RGB,BGR,RGBA,BGRA} one could convert between 'array formats' and the DynamicImage enum, and offer direct loaders for arrays (whichever way round it's wrapped behind the scenes) fn load_image_rgba8()->Array2D<PixelN<u8,4>> fn load_image_array_rgba8()->Array3D<u8> impl DynamicImage{ fn to_array_u8(&self )->(Array3d<u8>,ChannelNames) } impl From<(Array3d,ChannelNames) for DynamicImage .. //panics if depth & ChannelNames mismatch obviously`

  • in both the 3d array and 2d array of [T;N] cases, the buffer could be accessed as: image[Row][Column][Channel] ; (interleaved channel format) in the 3d case it may be slightly unintuitive that 'z,y,x' indices correspond to row,col,channel but this ordering is consistent with the memory layout one expects. image[Row][Column] would yield a slice holding channels

  • array indexing syntax saves the need for dedicated "read pixel / set pixel" APIs

  • idioms like "iter().flatten()" etc handle "stepping through all the pixels" "stepping through all the components" etc.

-Short of using an actual N-D array interface, functions could return (Vec<T>,[Width,Height,Channels]) which any n-d array lib should be able to construct from

reference - I found this C++ lib quite easy to use; it just takes and returns a channel count. https://github.com/kornelski/jpeg-compressor/blob/master/jpgd.h . it uses messy c++ return values, but pretty much tries to return (Vec,(w,h,ch)) which is why its so easy to use.

experiment9123 avatar May 20 '21 08:05 experiment9123

Could you take a look at the FlatSamples type? I'd be curious whether that might come closer to your use case.

fintelia avatar May 20 '21 13:05 fintelia

Could you take a look at the FlatSamples type? I'd be curious whether that might come closer to your use case.

Thanks, I hadn't seen that. It is definitely in the same direction - I see 3d indexing acess (channel, x,y). I see mention of support for planar or interleaved layout (i'm guessing it's a strided view of a 3d array)

my reasoning behind the broader suggestion here is that a lot of code behind this would overlap with other utilities you want for2d/ 3d arrays. Perhaps what I really want to do is take a subset of this image crate and wrap it in an array interface. ( I suppose the image loaders could just return (Vec<T>,[w,h]), then we .into() a 'real' 2d array type for all the iterators)

here's a playground sketching what I have in mind (WIP) .. i have some internal libs i'm moving in this direction (i might be able to share a crate but I'm certain there'd be a few impls like this already out there)

"extend the iterator idioms to 2d arrays, and store the pixels in a general purpose 2d array type" https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c4f85c3100c3bff4604e3cd941140df8

    let mut tile:Array2d<Rgba<u8>>=Array2d::default();
    tile.resize([2,2],Pixel([0,0,0,0]));
    tile[0][0]=Pixel([0,1,2,3]);
    tile[1][1]=Pixel([3,2,1,0]);
    tile[0][1][2]=99;  //"set pixel at(0,2). component 2 =99"
    println!("iter pixels\n");
    for c in tile.iter().flatten(){print!("{:?}",c);}
    println!("\niter rows\n");
    for c in tile.iter(){print!("{:?}",c);}
    println!("\niter components\n");
    // components
    for c in tile.iter().iter().flatten(){print!("{:?}",c);}

experiment9123 avatar May 20 '21 13:05 experiment9123

If you want to work with raw vectors, the ImageBuffer::{from_raw, into_raw} methods might be what you are looking for. Image buffers are generic over containers, but by default are backed by a Vec.

We've looked into working with a generic N-dimensional array library before though I can't quite find the issue. If I recall correctly, we didn't find anything that was widely used and matched our use case better than just having a struct containing a Vec and the image dimensions.

fintelia avatar May 20 '21 15:05 fintelia

We've looked into working with a generic N-dimensional array library before though I can't quite find the issue. If I recall correctly, we didn't find anything that was widely used and matched our use case better than just having a struct containing a Vec and the image dimensions.

ok that might be the sticking point, "there IS no popular rust 2d array abstraction" , heh. so you pretty much build it, in the image lib. no popular one -> everyone rolls their own -> too many , no one becomes popular (and yes I have one myself). Const generics are still quite new, they open up some streamlining.

I've definitely been able to do what I want with image.. it's a great crate. I just wonder if the amount of docs and interfaces one needs to search through to get to a simple use case could be streamlined a little.

Not sure this crate looks internally (i see it brings on alot of dependancies itself), this is the kind of overview i'd imagine. Perhaps this is exactly what you have, except the "red box" is custom. Each of the major boxes could be a single crate, i.e the "image-formats' crate could just decompress into (vec,size). I imagine the red box as entirely seperate general purpose library, and the green one (image manipulation) decoupleable from formats, DynamicImage etc. (purely a 2d discrete 'function manipulation' lib , in effect..)

my ideal "lightweight" interface would just a few functions.. decomp_from_mem_rgba8(&[u8])-> Array2d<[u8;4]> /*most cmmon*/ // whatevers there, pad out , decom_from_mem_rgba_f32(&)->Array2d<[f32;4] /*catch all for HDR*/ save_jpg(filename,(&[u8],size)) save_png(filename,(&[u8],size)) etc. Just "from" and "to" &[u8],w,h,ch , bypass this big enum and any conversions etc. But I can see why you have the huge DynamicImage enum to account for all the possibilities between GPU formats. A more comprehensive engine will certainly need all that.

IMG_3860

I missed a couple of arrows. I invisage the "image manipulation lib" (having operations like rescaling ) depending on 'Array2d' but not the formats, big dynamic array enum etc. It just wants to work on flat arrays of decompressed pixels (users knows the format). then people will use the other funcitons to do whatever . but the red part is definitely independent of any image manipulllation, and could be used for heightfields, voxels, game tilemaps, AI tensors, a basis for NxM matrices.. It seems like there is an opportunity for the community to stabilise something here.

perhaps the thing I imagine in the bottom left could be a seperate crate, "image-lite" or whatever (but also included in 'image')

experiment9123 avatar May 20 '21 16:05 experiment9123

But I can see why you have the huge DynamicImage enum to account for all the possibilities between GPU formats. A more comprehensive engine will certainly need all that.

The irony is that it already is quite huge and still doesn't come close to accounting for the multitude of GPU formats. I'd go as far as stating that it simply is infeasible to build an ergonomic image buffer that is both flexible to account for all various texel types and is strictly typed with regards to them. It doesn't scale because you would need to end up with separate types for all elements of the cross product 'color space × channel layout × numeric channel types × planar layout' and each of those dimensions has some 5-20 options. If you store them in a Vec<TexelType> then you can not convert image efficiently because transmuting the vectors/reusing the memory is unsound (I might want to do a blog post about this).

Realizing this, there is a very experimental library (https://github.com/image-rs/canvas) which I unfortunately didn't have the time to fully realize yet. However, it already explains the design itself in some detail (more documentation than here). If you want to contribute to the design and implementation of the red bubble in your hand-drawn graph then I suppose this would be a decent shot to get in early :)

With regards to matrices and tensors, yes, a similar approach might work but explicitly staying in two-dimension space does have advantages such as extents being a sized type, being able to mark extents as Copy etc. So I would tend to want to keep it that way for image-canvas at least.

HeroicKatora avatar May 20 '21 16:05 HeroicKatora

PS: You might want to have a look at https://draw.io, I can understand the flexibility of paper (and really prefer it for sketches as well) but it's a lot clearer to use a vector-graphics based tool.

HeroicKatora avatar May 20 '21 16:05 HeroicKatora

I'd go as far as stating that it simply is infeasible to build an ergonomic image buffer that is both flexible to account for all various texel types and is strictly typed with regards to them. It doesn't scale because you would need to end up with separate types for all elements of the cross product 'color space × channel layout × numeric channel types × planar layout' and each of those dimensions has some 5-20 options. If you store them in a Vec<TexelType> then you can not convert image efficiently because transmuting the vectors/reusing the memory is unsound (I might want to do a blog post about this).

this seems to confirm what I'm thinking - one really wants multiple tiers of complexity. "just load/save a jpg/png from a pixel array" ... vs.. "account for every GPU texture format and effcient transcoding" (and possibly libs to deal with photoshop layers , whatever)

(tangentially, elsewhere i've heard of someone who is making a C++ gpu texture transcoding lib and from twitter it appears they're interested in supporting rust. by coincidence I think its the very same person who made that lightweight jpg loader i happened to use in c++ )

experiment9123 avatar May 20 '21 16:05 experiment9123

regarding the "Array2D" idea i'm reminded of the difficulties in sharing struct Vec{x,y,z} .. similar kind of problem where people can diverge on details and style wheras there's a common part taht everyone easily agrees on. extention traits or wrapper traits get you quite far but sometimes the strict 'orphan rules' defeat sharing. (again in my own sourcebase i've tried to hedge my bets a bit with wrapper traits to swap 'struct{x,y,z}' and [f32;3] .. but the result is quite complex. "mint" is out there as a solution but sadly that precludes operator overloading.

I'm guessing you might resist "a seperate 2d array crate" for the same reasons I personally resist "other peoples vec x/y/z". I would like to see what the core language team make of this scenario and if there's any ways in which this kind of friction can be alleviated (like fields in traits so its easier share different types that happen to have exactly the same layout, or #[] hints that can over-ride the orphan rules)

experiment9123 avatar May 20 '21 16:05 experiment9123

PS: You might want to have a look at https://draw.io, I can understand the flexibility of paper (and really prefer it for sketches as well) but it's a lot clearer to use a vector-graphics based tool.

general idea -(i) try to simplify/reuse user facing complexity (fewer docs people have to scan for special purpose types -> instead they can spend more time learning how to apply general purpose rust iterator idioms) (ii) try to offer a reduced dependancy option (whilst not holding back the detail needed for the comprehensive solution)

Untitled Diagram (1)

experiment9123 avatar May 20 '21 17:05 experiment9123

For point (ii), I think you may be underestimating the amount of code that would live in the "image file format crate" box. Over half of the lines of code in image are part of the codecs submodule, and practically all of our dependencies are either other Rust crates to handle specific image formats (or libraries used to implement the decoders that are in this repository).

fintelia avatar May 21 '21 03:05 fintelia

For point (ii), I think you may be underestimating the amount of code that would live in the "image file format crate" box. (did some minor cleanup to the diagram)

sure the boxes are not to scale - this is more like 'user perspective scaled'. even if the hypothesised "ImageLite" is 75% the size, the idea here is about streamlining the experience of a new rust user jumping into the ecosystem and grabbing an image to get something visible (even if eventually he will need all the functionality of DynamicImage .. user code goes through a maturation cycle)

experiment9123 avatar May 21 '21 12:05 experiment9123

Hm, I wonder if the thing to do would be to try designing the public API for an image-lite crate, while just implementing everything with calls into the current crate. The main purpose would be to see whether we can come up with a substantially simpler API that would be useful from a user perspective.

fintelia avatar May 21 '21 13:05 fintelia

it seems 'ImageBuffer' is not really far off "an array interface", once again this reminds me of how its hard to make several things that are "really the same internally" interop .. (I wish there was a way, in rust, of making two struct {x,y,z}'s seemlessly interchangable)

some friction for me in ImageBuffer< Pixel, Vec<SubPixel> > vs SomeKindOfArray<Pixel> ...(although i did get what I wanted done with a little wrapper) e.g. a Array2D< [SubPixel;3] > gives you [y][x][ch] channel access, or complete pixel objects at [y][x]. I'm pretty sure you could make "ImageBuffer" behave like that with Index impl's . A straight array of Pixel types also gives the possibility of representing 5551, 565,4444 (admitedly thats probably in retro territory these days). with those in rust you can't really get tot the components via indexing but could certainly have a pixel type that could be convertible to/from Rgba or rgba (as a very general intermediate)

experiment9123 avatar May 23 '21 17:05 experiment9123