PR

This is an attempt to solve #986 as the current solution (using fastblur) is fragile and requires the consumers to access raw buffers of the image which is undesirable. The implementation is inspired by this article which is in turn based on Kovesi, P.: Fast Almost-Gaussian Filtering The Australian Pattern Recognition Society Conference: DICTA 2010. December 2010. Sydney.

The short story is that the real gaussian blur takes about 30 second for real-world photos which is undesirable. The algorithm here is quick and has pretty good results.

Questions

Is this PR ready?

No, but I want to have some feedback on the implementation so I can move on (in the right direction).

Are you aware of #2238?

Yes, but I don't know how to deal with it. blur is not yet deprecated and users of the library will search for a fast alternative in this crate, not in imageproc. I am open to move my code there now or later. The advantage of having it here would be that I do not suffer from the non_exhaustive of DynamicImage and can implement blur for this type (if desired).

Why didn't you simply c&p the code from the blurslice crate (or the article)?

The blurslice crate is distributed under the MIT license and re-licencing the code under MIT/Apache2 would require the approval of all contributors of the crate. On the other hand it doesn't integrate well with the principal structures DynamicImage and ImageBuffer. That's why I decided to re-implement the algorithm. I have to admit here are some details in both implementations that I don't understand and where I couldn't find an explanation. Hence I just did my own implementation that I think I understand.

What about the performance?

I didn't check the performance (and didn't optimize it). But opposed to the blur implementation the blurring finishes in a reasonable amount of time. As soon as the interface is fixed (i.e. whether to accept GenericImageView or DynamicImage) I can start to performance tweaks.

What about tests?

I can't really imagine how to represent functional tests. However, I plan to

write benchmarks as soon I dig into performance tuning
write trivial tests (sigma=1 is a no op)
write kani tests that make sure that the potentially panicking slice access doesn't panic

Is the interface fixed?

Currently, I accept an ImageBuffer simply because it lets me operate on slices rather than using another interface to access pixels. However, it might make sense to also accept an DynamicImage (can be easily accomplished) or a GenericImageView (requires a complete rewrite of the algorithm).

Will the "old" blur stay?

The fast blur algorithm is inaccurate but for enough for many applications. I would recommend that the "real" blur stays. It is not much code anyways as it is implemented general convolution with gaussian kernel.

License

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to choose either at their option.

Todos

[x] add reference to fast_blur in docs for blur
[x] implement fast_blur for DynamicImage
[x] work on clippy failures
[x] fix public_private_dependencies job
[x] write tests for index handling (kani or quickcheck)
[x] write benchmarks and optimize performance
[x] rebase interactively to create atomic commits

Aug 02 '24 08:08 torfmaster

Wow great work!

Are you aware of https://github.com/image-rs/image/issues/2238? Yes, but I don't know how to deal with it. blur is not yet deprecated and users of the library will search for a fast alternative in this crate, not in imageproc. I am open to move my code there now or later. The advantage of having it here would be that I do not suffer from the non_exhaustive of DynamicImage and can implement blur for this type (if desired).

Long-term I think this function should belong in imageproc since it is a image processing operation, for short-term, since a lot of imageops are yet to be deprecated it could be added temporarily in image and then copied across later but that seems like extra work to me.

Maybe we could add a note in the docs of the "slow" blur method in this crate that points to this faster method in the imageproc crate?

Regarding DynamicImage you can use the dynamic_map macro, (not sure if it's currently exported from the crate but it's easy enough to redefine yourself):

use image::{DynamicImage, ImageBuffer, Pixel};

macro_rules! dynamic_map(
        ($dynimage: expr, $image: pat => $action: expr) => ({
            use DynamicImage::*;
            match $dynimage {
                ImageLuma8($image) => ImageLuma8($action),
                ImageLumaA8($image) => ImageLumaA8($action),
                ImageRgb8($image) => ImageRgb8($action),
                ImageRgba8($image) => ImageRgba8($action),
                ImageLuma16($image) => ImageLuma16($action),
                ImageLumaA16($image) => ImageLumaA16($action),
                ImageRgb16($image) => ImageRgb16($action),
                ImageRgba16($image) => ImageRgba16($action),
                ImageRgb32F($image) => ImageRgb32F($action),
                ImageRgba32F($image) => ImageRgba32F($action),
                _ => unreachable!(),
            }
        });

        ($dynimage: expr, $image:pat_param, $action: expr) => (
            match $dynimage {
                DynamicImage::ImageLuma8($image) => $action,
                DynamicImage::ImageLumaA8($image) => $action,
                DynamicImage::ImageRgb8($image) => $action,
                DynamicImage::ImageRgba8($image) => $action,
                DynamicImage::ImageLuma16($image) => $action,
                DynamicImage::ImageLumaA16($image) => $action,
                DynamicImage::ImageRgb16($image) => $action,
                DynamicImage::ImageRgba16($image) => $action,
                DynamicImage::ImageRgb32F($image) => $action,
                DynamicImage::ImageRgba32F($image) => $action,
                _ => unreachable!(),
            }
        );
);

fn main() {
    let image1 = DynamicImage::new_luma8(3, 3);
    let image2 = DynamicImage::new_rgb8(3, 3);

    let image1 = dynamic_map!(image1, i => none(i));
    let image2 = dynamic_map!(image2, i => none(i));
}

fn none<P: Pixel, Container>(
    _image: ImageBuffer<P, Vec<Container>>,
) -> ImageBuffer<P, Vec<Container>> {
    todo!()
}

Aug 02 '24 16:08 ripytide

Thank you for your comments. I added tasks for your (and my points) so we can track progress more easily.

Aug 02 '24 20:08 torfmaster

I have pushed a rather coarse fix for the private dependency checks. However, I don't know how to deal with these checks. I found no easy way of systematically fix this. The main problem is, that I can't use the public dependency property in the regular Cargo.toml as the feature is unstable and can't be used in stable Rust. Hence, to use this feature we would have to

add the cargo feature in the github action (as it is currently done) using a shell script
specify the rayon crate as public using a shell script (very ugly)

To make this PR mergeable I hacked an "allow" for this llint to buffer_par knowing that clippy won't complain about allowing an unknown lint. However, this solution is likely to fail in the future.

Aug 03 '24 20:08 torfmaster

I did some initial benchmarking: on my intel i9 machine the bench (1024x768 rgb) takes 680ms with the true gaussian blur whereas fast_blur takes 57 ms. I checked in both benchmarks. Next I will try to find some bottlenecks in the implementation.

Aug 04 '24 11:08 torfmaster

Hi, some ways to improve noted in my comments on zune-imageprocs (author of the library)

https://github.com/etemesi254/zune-image/blob/734bdc7bfcd58860b0a92dde0a98055a1f94b422/crates/zune-imageprocs/src/box_blur.rs#L197-L220

Never did the loop unrolling to save ilp, but others were implemented, really nice properties like no bounds check in inner loop, may be worth investigating (see https://godbolt.org/z/zexPEd976)

See https://fgiesen.wordpress.com/2012/07/30/fast-blurs-1/ and https://fgiesen.wordpress.com/2012/08/01/fast-blurs-2/ for the theories behind it

Aug 04 '24 17:08 etemesi254

I invested some time into performance optimization:

simplify leftmost value by r * f(0)
get rid of bound checks by asserting the length of the samples
get rid of conditionals checking for lower and upper bounds

None of these measures produced any results. I am leaving the code as it is for now (if no one comes up with another idea, of course).

Aug 04 '24 17:08 torfmaster

Furthermore, I see no need for kani/quickcheck tests. First the bounds are obviously correct. On the other hand there is no point in using unsafe so securing the code using kani or quickcheck is unnecessary. Regarding the other tests I added tests for edge cases of sigma. If you have more ideas regarding reasonable tests I am happy to implement them. I feel the PR is close to "ready for merge" (apart from squashing and reordering the commits) and your comments, of course.

Aug 04 '24 17:08 torfmaster

@etemesi254: I now implemented the performance improvements I am able to understand (that is I used transposing to improve usage of CPU caches). The algorithmic tweaks were already in place, though. I total I could speed up the blurring by 15 per cent. I feel I am not able to understand the more complicated improvements as I am lacking deeper assembly language knowledge.

Aug 05 '24 09:08 torfmaster

Nice.

FYI these were the benchmark results for gaussian blur for big images before this commit . (image-rs, libvips,zune-image)

https://etemesi254.github.io/assets/criterion/imageprocs_%20gaussian%20blur/report/index.html

Aug 05 '24 10:08 etemesi254

My work here is done for now and I would be happy about a code review

Aug 05 '24 20:08 torfmaster

@ripytide sorry to bother you (but you already commented on the issue): Do you know what would be the next steps to merge this PR (and get it released)? Thank you!

Aug 12 '24 14:08 torfmaster

You'll need one of the maintainers signoff, of which I am not one.

Also, I don't think this function belongs in this library and should instead be in the imageproc library.

Aug 12 '24 17:08 ripytide

Thank you for your feedback. I created a PR in imageproc as well (https://github.com/image-rs/imageproc/pull/683).

Aug 13 '24 17:08 torfmaster

I just wanted to give a short status report (I also cross post this comment to https://github.com/image-rs/imageproc/pull/683)

I added a functional test showing (for an example) that the effect of fast_blur is close to the Gaussian blur implementation in this library
I added a benchmark that shows that fast blur is much faster than the current blur implementation
I think I addressed all code review comments

What would be the next steps?

Sep 04 '24 12:09 torfmaster

@HeroicKatora sorry for pinging you directly but you already started a review. Is there anything I can do to support the review of this PR? Thank you a lot!

Sep 11 '24 04:09 torfmaster

I've written the following example code for blur - feel free to include it in the PR under examples/:

use std::env;
use std::error::Error;
use std::path::Path;
use std::process;

fn main() -> Result<(), Box<dyn Error>> {
    // Collect command-line arguments
    let args: Vec<String> = env::args().collect();

    // Ensure that we have 2 CLI arguments
    if args.len() != 3 {
        eprintln!("Usage: {} <path> <radius>", args[0]);
        process::exit(1);
    }

    let radius: f32 = args[2].parse()?;

    // Load the input image
    let path_str = &args[1];
    let path = Path::new(path_str);
    let input = image::open(path)?;

    // Correct but slow Gaussian blur
    let mut out_path_gauss = path.to_owned();
    out_path_gauss.set_extension("gaussian.png");

    let correct_but_slow = input.blur(radius);
    correct_but_slow.save(out_path_gauss)?;

    // Approximate but fast blur
    let mut out_path_fast = path.to_owned();
    out_path_fast.set_extension("fast.png");

    let approximate_but_fast = input.fast_blur(radius);
    approximate_but_fast.save(out_path_fast)?;

    Ok(())
}

With this test image both blur implementation show artifacts:

blur-test-input

It's just a white rectangle in the middle, surrounded by fully transparent pixels.

Both Gaussian blur in image and the fast blur implementation bleed color from fully transparent pixels into the white rectangle, resulting in the following image:

blur-test-input fast

The fully transparent pixels have R set to 255 while all the other channels are set to 0, and that red color bleeds into the white rectangle.

The expected blur as produced by GIMP is like this - blurred white rectangle without any red color:

blurred-gimp-20px

To avoid such artifacts, the pixel's contribution to the color change should be weighted by its alpha channel value. Fully transparent pixels should get multiplied by 0 and not contribute to the changes in non-alpha channels at all.

But since both blur implementations are incorrect, I'll leave deciding how to handle this up to the maintainers.

Sep 12 '24 12:09 Shnatsel

Fast Blur

PR

Questions

Is this PR ready?

Are you aware of #2238?

Why didn't you simply c&p the code from the blurslice crate (or the article)?

What about the performance?

What about tests?

Is the interface fixed?

Will the "old" blur stay?

License

Todos