OpenImageIO icon indicating copy to clipboard operation
OpenImageIO copied to clipboard

feat(iba): IBA::perpixel_op

Open lgritz opened this issue 8 months ago • 2 comments

Inspired by a question by Vlad Erium, I have added a simpler way for C++ users of OIIO to construct IBA-like functions for simple unary and binary operations on ImageBufs where each pixel is independent and based only on the corresponding pixel of the input(s).

The user only needs to supply the contents of the inner loop, i.e. just doing one pixel's work, and only needs to work for float values. All format conversion, sizing and allocation of the destination buffer, looping over pixels, and multithreading is automatic.

If the actual buffers in question are not float-based, conversions will happen automatically, at about a 2x slowdown compared to everything being in float all along, which seems reasonable for the extreme simplicity, especially for use cases where the buffers are fairly likely to be float anyway.

What you pass is a function or lambda that takes spans for the output and input pixel values. Here's an example that adds two images channel by channel, producing a sum image:

// Assume ImageBuf A, B are the inputs, ImageBuf R is the output
R = ImageBufAlgo::perpixel_op(A, B,
        [](span<float> r, cspan<float> a, cspan<float> b) {
            for (size_t c = 0, nc = size_t(r.size()); c < nc; ++c)
                r[c] = a[c] + b[c];
            return true;
        });

This is exactly equivalent to calling

R = ImageBufAlgo::add(A, B);

and for float IB's, it's just as fast.

To make the not-float case fast and not require the DISPATCH macro magic, I needed to change the ImageBuf::Iterator just a bit to add store() and load() method templates to the iterators, and add a field that holds the buffer type. That might make a slight ABI tweak, so I am thinking that I will make this for the upcoming OIIO 3.0, and not backport to the release branch.

I think this is ready to introduce at this time, but I'm also studying whether more varieties of this approach are needed, whether the non-float case can be sped up even more, and whether some of the existing IBA functions should switch to using this internally (good candidates would be those that are almost always performed on float buffers, but for which the heavy template expansion of the DISPATCH approach to handling the full type zoo currently makes them very bloated and expensive to compile, for very little real-world gain).

We should probably consider this to be experimental for a little while, just in case the function signature for this changes as I think about it more or add functionality.

lgritz avatar Jun 17 '24 20:06 lgritz