Fast Blur
PR
This is an attempt to solve #986 as the current solution (using
fastblur) is fragile and requires the consumers to access raw buffers
of the image which is undesirable.
The implementation is inspired by
this article
which is in turn based on Kovesi, P.: Fast Almost-Gaussian Filtering The Australian Pattern
Recognition Society Conference: DICTA 2010. December 2010. Sydney.
The short story is that the real gaussian blur takes about 30 second for real-world photos which is undesirable. The algorithm here is quick and has pretty good results.
Questions
Is this PR ready?
No, but I want to have some feedback on the implementation so I can move on (in the right direction).
Are you aware of #2238?
Yes, but I don't know how to deal with it. blur is not yet deprecated and
users of the library will search for a fast alternative in this crate,
not in imageproc. I am open to move my code there now or later.
The advantage of having it here would be that I do not suffer from
the non_exhaustive of DynamicImage and can implement blur
for this type (if desired).
Why didn't you simply c&p the code from the blurslice crate (or the article)?
The blurslice crate is distributed under the MIT license and
re-licencing the code under MIT/Apache2 would require the approval
of all contributors of the crate.
On the other hand it doesn't integrate well with the principal
structures DynamicImage and ImageBuffer. That's why I
decided to re-implement the algorithm. I have to admit here are some
details in both implementations that I don't understand and where
I couldn't find an explanation. Hence I just did my own implementation
that I think I understand.
What about the performance?
I didn't check the performance (and didn't optimize it).
But opposed to the blur implementation the blurring finishes
in a reasonable amount of time.
As soon as the interface is fixed (i.e. whether to accept GenericImageView or
DynamicImage) I can start to performance tweaks.
What about tests?
I can't really imagine how to represent functional tests. However, I plan to
- write benchmarks as soon I dig into performance tuning
- write trivial tests (
sigma=1is a no op) - write kani tests that make sure that the potentially panicking slice access doesn't panic
Is the interface fixed?
Currently, I accept an ImageBuffer simply because it lets me operate on slices rather
than using another interface to access pixels. However, it might make sense
to also accept an DynamicImage (can be easily accomplished) or a
GenericImageView (requires a complete rewrite of the algorithm).
Will the "old" blur stay?
The fast blur algorithm is inaccurate but for enough for many applications. I would recommend that the "real" blur stays. It is not much code anyways as it is implemented general convolution with gaussian kernel.
License
I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to choose either at their option.
Todos
- [x] add reference to
fast_blurin docs forblur - [x] implement
fast_blurforDynamicImage - [x] work on clippy failures
- [x] fix
public_private_dependenciesjob - [x] write tests for index handling (
kaniorquickcheck) - [x] write benchmarks and optimize performance
- [x] rebase interactively to create atomic commits
Wow great work!
Are you aware of https://github.com/image-rs/image/issues/2238? Yes, but I don't know how to deal with it. blur is not yet deprecated and users of the library will search for a fast alternative in this crate, not in imageproc. I am open to move my code there now or later. The advantage of having it here would be that I do not suffer from the non_exhaustive of DynamicImage and can implement blur for this type (if desired).
Long-term I think this function should belong in imageproc since it is a image processing operation, for short-term, since a lot of imageops are yet to be deprecated it could be added temporarily in image and then copied across later but that seems like extra work to me.
Maybe we could add a note in the docs of the "slow" blur method in this crate that points to this faster method in the imageproc crate?
Regarding DynamicImage you can use the dynamic_map macro, (not sure if it's currently exported from the crate but it's easy enough to redefine yourself):
use image::{DynamicImage, ImageBuffer, Pixel};
macro_rules! dynamic_map(
($dynimage: expr, $image: pat => $action: expr) => ({
use DynamicImage::*;
match $dynimage {
ImageLuma8($image) => ImageLuma8($action),
ImageLumaA8($image) => ImageLumaA8($action),
ImageRgb8($image) => ImageRgb8($action),
ImageRgba8($image) => ImageRgba8($action),
ImageLuma16($image) => ImageLuma16($action),
ImageLumaA16($image) => ImageLumaA16($action),
ImageRgb16($image) => ImageRgb16($action),
ImageRgba16($image) => ImageRgba16($action),
ImageRgb32F($image) => ImageRgb32F($action),
ImageRgba32F($image) => ImageRgba32F($action),
_ => unreachable!(),
}
});
($dynimage: expr, $image:pat_param, $action: expr) => (
match $dynimage {
DynamicImage::ImageLuma8($image) => $action,
DynamicImage::ImageLumaA8($image) => $action,
DynamicImage::ImageRgb8($image) => $action,
DynamicImage::ImageRgba8($image) => $action,
DynamicImage::ImageLuma16($image) => $action,
DynamicImage::ImageLumaA16($image) => $action,
DynamicImage::ImageRgb16($image) => $action,
DynamicImage::ImageRgba16($image) => $action,
DynamicImage::ImageRgb32F($image) => $action,
DynamicImage::ImageRgba32F($image) => $action,
_ => unreachable!(),
}
);
);
fn main() {
let image1 = DynamicImage::new_luma8(3, 3);
let image2 = DynamicImage::new_rgb8(3, 3);
let image1 = dynamic_map!(image1, i => none(i));
let image2 = dynamic_map!(image2, i => none(i));
}
fn none<P: Pixel, Container>(
_image: ImageBuffer<P, Vec<Container>>,
) -> ImageBuffer<P, Vec<Container>> {
todo!()
}
Thank you for your comments. I added tasks for your (and my points) so we can track progress more easily.
I have pushed a rather coarse fix for the private dependency checks. However, I don't know how to deal with these checks. I found no easy way of systematically fix this. The main problem is, that I can't use the public dependency property in the regular Cargo.toml as the feature is unstable and can't be used in stable Rust. Hence, to use this feature we would have to
- add the cargo feature in the github action (as it is currently done) using a shell script
- specify the
rayoncrate aspublicusing a shell script (very ugly)
To make this PR mergeable I hacked an "allow" for this llint to buffer_par knowing that clippy won't complain about allowing an unknown lint. However, this solution is likely to fail in the future.
I did some initial benchmarking: on my intel i9 machine the bench (1024x768 rgb) takes 680ms with the true gaussian blur whereas fast_blur takes 57 ms. I checked in both benchmarks.
Next I will try to find some bottlenecks in the implementation.
Hi, some ways to improve noted in my comments on zune-imageprocs (author of the library)
https://github.com/etemesi254/zune-image/blob/734bdc7bfcd58860b0a92dde0a98055a1f94b422/crates/zune-imageprocs/src/box_blur.rs#L197-L220
Never did the loop unrolling to save ilp, but others were implemented, really nice properties like no bounds check in inner loop, may be worth investigating (see https://godbolt.org/z/zexPEd976)
See https://fgiesen.wordpress.com/2012/07/30/fast-blurs-1/ and https://fgiesen.wordpress.com/2012/08/01/fast-blurs-2/ for the theories behind it
I invested some time into performance optimization:
- simplify leftmost value by
r * f(0) - get rid of bound checks by asserting the length of the samples
- get rid of conditionals checking for lower and upper bounds
None of these measures produced any results. I am leaving the code as it is for now (if no one comes up with another idea, of course).
Furthermore, I see no need for kani/quickcheck tests. First the bounds are obviously correct. On the other hand there is no point in using unsafe so securing the code using kani or quickcheck is unnecessary.
Regarding the other tests I added tests for edge cases of sigma. If you have more ideas regarding reasonable tests I am happy to implement them.
I feel the PR is close to "ready for merge" (apart from squashing and reordering the commits) and your comments, of course.
@etemesi254: I now implemented the performance improvements I am able to understand (that is I used transposing to improve usage of CPU caches). The algorithmic tweaks were already in place, though. I total I could speed up the blurring by 15 per cent. I feel I am not able to understand the more complicated improvements as I am lacking deeper assembly language knowledge.
Nice.
FYI these were the benchmark results for gaussian blur for big images before this commit . (image-rs, libvips,zune-image)
https://etemesi254.github.io/assets/criterion/imageprocs_%20gaussian%20blur/report/index.html
My work here is done for now and I would be happy about a code review
@ripytide sorry to bother you (but you already commented on the issue): Do you know what would be the next steps to merge this PR (and get it released)? Thank you!
You'll need one of the maintainers signoff, of which I am not one.
Also, I don't think this function belongs in this library and should instead be in the imageproc library.
Thank you for your feedback. I created a PR in imageproc as well (https://github.com/image-rs/imageproc/pull/683).
I just wanted to give a short status report (I also cross post this comment to https://github.com/image-rs/imageproc/pull/683)
- I added a functional test showing (for an example) that the effect of
fast_bluris close to the Gaussian blur implementation in this library - I added a benchmark that shows that fast blur is much faster than the current blur implementation
- I think I addressed all code review comments
What would be the next steps?
@HeroicKatora sorry for pinging you directly but you already started a review. Is there anything I can do to support the review of this PR? Thank you a lot!
I've written the following example code for blur - feel free to include it in the PR under examples/:
use std::env;
use std::error::Error;
use std::path::Path;
use std::process;
fn main() -> Result<(), Box<dyn Error>> {
// Collect command-line arguments
let args: Vec<String> = env::args().collect();
// Ensure that we have 2 CLI arguments
if args.len() != 3 {
eprintln!("Usage: {} <path> <radius>", args[0]);
process::exit(1);
}
let radius: f32 = args[2].parse()?;
// Load the input image
let path_str = &args[1];
let path = Path::new(path_str);
let input = image::open(path)?;
// Correct but slow Gaussian blur
let mut out_path_gauss = path.to_owned();
out_path_gauss.set_extension("gaussian.png");
let correct_but_slow = input.blur(radius);
correct_but_slow.save(out_path_gauss)?;
// Approximate but fast blur
let mut out_path_fast = path.to_owned();
out_path_fast.set_extension("fast.png");
let approximate_but_fast = input.fast_blur(radius);
approximate_but_fast.save(out_path_fast)?;
Ok(())
}
With this test image both blur implementation show artifacts:
It's just a white rectangle in the middle, surrounded by fully transparent pixels.
Both Gaussian blur in image and the fast blur implementation bleed color from fully transparent pixels into the white rectangle, resulting in the following image:
The fully transparent pixels have R set to 255 while all the other channels are set to 0, and that red color bleeds into the white rectangle.
The expected blur as produced by GIMP is like this - blurred white rectangle without any red color:
To avoid such artifacts, the pixel's contribution to the color change should be weighted by its alpha channel value. Fully transparent pixels should get multiplied by 0 and not contribute to the changes in non-alpha channels at all.
But since both blur implementations are incorrect, I'll leave deciding how to handle this up to the maintainers.