image icon indicating copy to clipboard operation
image copied to clipboard

Faster `copy_from` implementation

Open Jasper-Bekkers opened this issue 10 months ago • 3 comments

After https://github.com/image-rs/image/pull/909 w/ the move to remove unsafe code, I think (didn't test extensively) the implementation of copy_from got significantly slower. At the very least, it became a recent bottleneck for me that I spent some time looking at.

The implementation of copy_from is a pretty straight forward get_pixel/put_pixel loop over the image data where the function calls aren't inlined and are performing bounds checks on every access, even tho the preconditions of the function already do the bounds check beforehand.

In my specific case I managed to ~3x increase the performance, without re-introducing unsafe code, by copying over scanlines at a time. The code below lives in my project because I can live without it having to be a generalized implementation (as you see - it would need to be adapted to GenericImage/GenericImageView to be able to live in this project properly).

I'm submitting it here (rather then as a pull request) because I found a way to entirely not go down this code path at all anymore, however, I'm sharing it because I think it's useful to this project still and can potentially be adapted.

trait FastCopyFrom {
    type Pixel: image::Pixel;

    fn fast_copy_from(&mut self, other: &image::RgbaImage, x: u32, y: u32);
}

impl FastCopyFrom for image::RgbaImage {
    type Pixel = <image::RgbaImage as GenericImageView>::Pixel;
    fn fast_copy_from(&mut self, other: &image::RgbaImage, x: u32, y: u32) {
        // Do bounds checking here so we can use the non-bounds-checking
        // functions to copy pixels.
        if self.width() < other.width() + x || self.height() < other.height() + y {
            return;
        }

        let other_data = other.as_raw();
        let self_w = self.width();

        let mut self_data = self.as_flat_samples_mut();
        let mut self_slice = self_data.as_mut_slice();
        let other_w = other.width();

        for k in 0..other.height() {
            let other_start = ((k * other_w) * 4) as usize;
            let other_end = (((k + 1) * other_w) * 4) as usize;

            let self_start = (((x) + ((k + y) * self_w)) * 4) as usize;
            let self_end = self_start + (other_w * 4) as usize;

            self_slice[self_start..self_end].copy_from_slice(
                &other_data[other_start..other_end]
            );
        }
    }
}

Timings before

Image

Timings after

Image

Jasper-Bekkers avatar Feb 05 '25 12:02 Jasper-Bekkers

Yeah, this is partially a problem with the GenericImage trait. Because it only exposes get_pixel / put_pixel rather than any way to directly access entire slices of pixels, most operations using it end up being slower than they should be.

fintelia avatar Mar 10 '25 05:03 fintelia

Thanks for the detailed performance graphs. Closing this as a special case of #1601 (and yes, that's how long a fundamental design problem with traits sadly persists).

197g avatar Aug 31 '25 01:08 197g

Ah sorry, wait, this one has a draft and the other one does not. I'll switch it around.

197g avatar Aug 31 '25 01:08 197g