marvinj icon indicating copy to clipboard operation
marvinj copied to clipboard

Gaussian Blur implementation is very slow

Open EhsanKia opened this issue 5 years ago • 2 comments

On a simple 1080x800px image, with kernel of size 10, it takes well over 15s. Not sure if there's more efficient ways to implement it, is it the same method used as the Java version?

I wonder if it's possible to use the browser's CSS filter for guassian blur, then transfer that back to a canvas.

EhsanKia avatar Apr 29 '19 01:04 EhsanKia

I looked through the code that does Gaussian blur, and the efficiency is at least O(n)^4 or larger. There is two for loops to iterate through the image, and there are two for loops to iterate through the kernel. It is possible to completely unroll the for loops, as in make them disappear, for iterating through the image kernel. That would make the efficiency at O(n)^2.

Example in Python:

gausKernel = [[1/16, 2/16, 1/16], [2/16, 4/16, 2/16], [1/16, 2/16, 1/16]]
accum = 0
data2 = copy.deepcopy(data)
for i in range(1, data.shape[0] - 1):
    for j in range(1, data.shape[1] - 1):
        accum += gausKernel[0][0] * data2[i - 1][j - 1][0]
        accum += gausKernel[0][1] * data2[i - 1][j][0]
        accum += gausKernel[0][2] * data2[i - 1][j + 1][0]
        accum += gausKernel[1][0] * data2[i][j - 1][0]
        accum += gausKernel[1][1] * data2[i][j][0]
        accum += gausKernel[1][2] * data2[i][j + 1][0]
        accum += gausKernel[2][0] * data2[i + 1][j - 1][0]
        accum += gausKernel[2][1] * data2[i + 1][j][0]
        accum += gausKernel[2][2] * data2[i + 1][j + 1][0]
        accum = round(accum)
        data2[i][j][0] = accum
        data2[i][j][1] = accum
        data2[i][j][2] = accum
        accum = 0
        
        
image3 = Image.fromarray(data2)
display(image3)

For variable kernel sizes, it would obviously make the code larger to maintain since you would need multiple functions.

M-Valentino avatar Sep 24 '23 20:09 M-Valentino

You're not magically reducing the time complexity by unrolling the inner loops. It wasn't really O(n^4) before, it was O(w*h*k^2) and it still is after your change. But yeah this is a very slow operation you'd probably want to do on the gpu in parallel.

EhsanKia avatar Sep 25 '23 00:09 EhsanKia