ImageFiltering.jl possible performance improvement to `mapwindow`

trafficstars

Didn't explore it in depth, but the following hand-written version in 5mins is faster than what mapwindow provides, so I believe there are still room for performance tweak:

# Julia Version 1.6.0-DEV.497
# Commit bd318e6662 (2020-07-20 22:11 UTC)

julia> img = float32.(testimage("camera"))

julia> function my_mapwindow(f, img, window)
           out = zeros(eltype(img), axes(img))
           R = CartesianIndices(img)
           I_first, I_last = first(R), last(R)
           Δ = CartesianIndex(ntuple(x->window ÷ 2, ndims(img)))
           @inbounds @simd for I in R
               patch = max(I_first, I-Δ):min(I_last, I+Δ)
               out[I] = f(view(img, patch))
           end
           return out
       end
my_mapwindow (generic function with 1 methods)

# LoopVectorization v0.8.6
julia> function my_mapwindow_avx(f, img, window=3)
           out = zeros(eltype(img), axes(img))
           R = CartesianIndices(img)
           I_first, I_last = first(R), last(R)
           Δ = CartesianIndex(ntuple(x->window ÷ 2, ndims(img)))
           @avx for I in R
               patch = max(I_first, I-Δ):min(I_last, I+Δ)
               out[I] = f(view(img, patch))
           end
           return out
       end

julia> mse(my_mapwindow(mean, img, 3), mapwindow(mean, img, (3, 3)))
1.2622413f-8

julia> @btime my_mapwindow(mean, $img, 3);
  7.183 ms (2 allocations: 1.00 MiB)

julia> @btime my_mapwindow_avx(mean, $img, 3);
  6.423 ms (2 allocations: 1.00 MiB)

julia> @btime mapwindow(mean, $img, (3, 3));
  8.069 ms (57271 allocations: 3.78 MiB)

Jul 23 '20 01:07 johnnychen94

Yes, now that wrappers don't allocate we should revisit the strategies here.

Jul 23 '20 21:07 timholy

For 2d windows at least, mapwindow can be an order of magnitude faster by instead of a view into the main array, moving a block of StaticArrays along the rows and updating it for each column using a @generated function that reads only one column at a time. The block can be of 2 or 4 etc StaticArray windows so we read an even number each time. You can read less than one cache line per window this way, and use one thread per block of rows.

I wrote this for DynamicGrids.jl, but it would be good to abstract this out into some kind of generalised stencil package some day so these algs can be more widely used: https://github.com/cesaraustralia/DynamicGrids.jl/blob/master/src/maprules.jl#L348-L370

Mar 03 '21 05:03 rafaqz

I noticed performance seemed to degrade a lot(let's say it didn't remain realtime) as I increased the filter size to something like (13,13) ( on a image of around (500,500) but then was able to find let's say a more suitable way for problem by first using mapwindow with filter (3,3) that worked. It's an excellent tool though, to remove noise and small blobs. would be great to have even slightly better version of it. ps: was using for realtime hand gesture recognition, worked great with our julian tools..though still looking for optimizations : )

Code for the project

using GLMakie, VideoIO
using Images, ImageDraw, BoundingSphere
using Printf
using LinearAlgebra
using Cairo, Luxor, LazySets, StaticArrays

function drawdots!(img, res, color )
    for i in res
        img[i[1]-5:i[1]+5, i[2]-5:i[2]+5] .= color
    end
end

function dist2p(p1, p2)
    sqrt((p1[1]-p2[1])^2 + (p1[2]-p2[2])^2)
end

function findmyangle(a1, a2; center)
    acos((dist2p(a1,center)^2 + dist2p(a2,center)^2 - dist2p(a1,a2)^2) / (2 * dist2p(center, a1) * dist2p(center, a2)))
end


"""
    findconvexitydefects(contour, convhull; dist = 1.1, absdiff = 10, mindist = 0, currsize = 50, d1 = 2, d2 = 2, anglemax = π/2)   
return the convexity defects using contour points and convexhull

###### Arguments
- contour -> contour points using suzuki and abe algorithm
- convexhull -> convexhull boundaries found from ImageMorphology.jl
- dist and absdiff -> helps in edge cases when trying to synchronize contour points and convexhull
- mindist -> to control min distance of a defect from a convex hull region line
- currsize -> to avoid small regions of contours
- d1 and d2 -> to control of defect from pair of convexhull points forming line individually
- anglemax -> helps to control max angle between convex hull line points and the defect
Idea           |  For one region
:-------------------------:|:-------------------------:
![](https://i1.wp.com/theailearner.com/wp-content/uploads/2020/11/hand_hull1.jpg?resize=624%2C438&ssl=1) |  ![](https://i0.wp.com/theailearner.com/wp-content/uploads/2020/11/conv_def1.jpg?w=489&ssl=1)


"""
function findconvexitydefects(
    contour, 
    convhull;
    dist=1.1,
    absdiff=10, 
    mindist=0,
    currsize= 50,
    d1 = 2,
    d2 = 2,
    anglemax = π/2
    )
    # first we need to match our contours and our convehull regions
    numindices = []
    previous = 0
    for i in convhull
        for (num,j) in enumerate(contour) 
            if norm(Tuple(i) .- Tuple(j)) < dist && abs(previous-num) > absdiff # to avoid small and very close regions
                    push!(numindices, num)
                    previous = num
                break
            end
        end
    end
    # we want the numindices same as our convhull points, 
    # to define regions of interest for each convhull line
    
    defects = Vector{CartesianIndex{2}}([]) # indexes with defects

    # incase numndices < convhull indexes, 
    # meaning we don't hv regions for all lines
    if size(numindices)[1] < size(convhull)[1] 
        throw(error("Raise the range dist, numindices points less than convexhull points, $(size(numindices)[1]) $(size(convhull)[1]) "))
    end

    # iterate over each consecutive pair of convhull points to form line
    for i in 1:size(convhull)[1]-1
        # to handle the case where numindices are like 1256, then 1
        if numindices[i] > numindices[i+1]
            curr = vcat(contour[numindices[i]: end], contour[1: numindices[i+1]])
        else
            # general case to define contours poins for each convhull region
            curr = contour[numindices[i]:numindices[i+1]]
        end

        # to remove minor regions of contours, we can set currsize
        if size(curr)[1] < currsize
            continue
        end

        # Defining the line     
        p1 = Float64.(Tuple(convhull[i]))  # point 1
        p2 = Float64.(Tuple(convhull[i+1])) # point 2
        line = Line(;from=[p1[1], p1[2]], to=[p2[1], p2[2]])

        maxdef = 0 # max distance from our convhull line
        defloc = CartesianIndex(0,0) # location of the that point

        # check for each contour point in a convhull region
        # their distance and find max distance point
        for j in curr
            p = SA[j[1], j[2]]
            lpdist = LazySets.distance(p, line) # find distance
            # update if we find new max
            if lpdist > maxdef
                maxdef = lpdist
                defloc = j
            end
        end
        def1 = norm(Tuple(defloc) .- Tuple(p1))
        def2 = norm(Tuple(defloc) .- Tuple(p2))
        # we can define what minimum distance from line should be
        angle = findmyangle(p1,p2; center=defloc)
        if maxdef > mindist && def1 > d1 && def2 > d2 && angle < anglemax
            push!(defects, defloc)
        end
    end
    defects
end



"""
function objecttracker(img)

Return image which tracks green colored objects 
from the input image

## Details
Steps

- Converts image to HSV space
- Set HSV value range to create binary mask
  which can identify green objects
- Find contours from that binary mask
- Pick the largest contour area and find its 
  bounding box
- Draw the bounding box on the input image
- return the resultant image

"""
function objecttracker(
    img,
    h = 20,
    s = 255,
    v = 255,
    lh = 0,
    ls = 20,
    lv = 70,
    boundingboxvar = 10
    )
    hsv_img = HSV.(img)
    channels = channelview(float.(hsv_img))
    hue_img = channels[1, :, :]
    val_img = channels[3, :, :]
    satur_img = channels[2, :, :]
    mask = zeros(size(hue_img))
    h, s, v = h, s, v
    h1, s1, v1 = lh, ls, lv
    ex = boundingboxvar
    for ind in eachindex(hue_img)
        if hue_img[ind] <= h && satur_img[ind] <= s / 255 && val_img[ind] <= v / 255
            if hue_img[ind] >= h1 && satur_img[ind] >= s1 / 255 && val_img[ind] >= v1 / 255
                mask[ind] = 1
            end
        end
    end
    mask = mapwindow(ImageFiltering.median, mask, (3, 3))
    img = dilate(dilate(dilate(mask)))
    contours = find_contours(img)

    try
        convhull = convexhull(img .> 0.5)
        push!(convhull, convhull[1])
        res = findconvexitydefects(contours[1], convhull; dist=3, absdiff = 2, currsize= 30, mindist =20)
        img_convex1 = RGB{N0f8}.(ones(size(img)))
        drawdots!(img_convex1, res, RGB(0,0,1))
        draw!(img_convex1, ImageDraw.Path(convhull), RGB(0))
        draw_contours(img_convex1, RGB(0), contours) 
        return img_convex1, size(res)[1]
    catch e
        img_convex1 = RGB{N0f8}.(ones(size(img)))
        draw_contours(img_convex1, RGB(0), contours) 
        return img_convex1 , -1

        # return Gray.(img_convex1), e
        # return Gray.(img) , 0
    end
end;

# rotate direction clocwise
function clockwise(dir)
    return (dir) % 8 + 1
end

# rotate direction counterclocwise
function counterclockwise(dir)
    return (dir + 6) % 8 + 1
end

# move from current pixel to next in given direction
function move(pixel, image, dir, dir_delta)
    newp = pixel + dir_delta[dir]
    height, width = size(image)
    if (0 < newp[1] <= height) && (0 < newp[2] <= width)
        if image[newp] != 0
            return newp
        end
    end
    return CartesianIndex(0, 0)
end

# finds direction between two given pixels
function from_to(from, to, dir_delta)
    delta = to - from
    return findall(x -> x == delta, dir_delta)[1]
end

function detect_move(image, p0, p2, nbd, border, done, dir_delta)
    dir = from_to(p0, p2, dir_delta)
    moved = clockwise(dir)
    p1 = CartesianIndex(0, 0)
    while moved != dir ## 3.1
        newp = move(p0, image, moved, dir_delta)
        if newp[1] != 0
            p1 = newp
            break
        end
        moved = clockwise(moved)
    end

    if p1 == CartesianIndex(0, 0)
        return
    end

    p2 = p1 ## 3.2
    p3 = p0 ## 3.2
    done .= false
    while true
        dir = from_to(p3, p2, dir_delta)
        moved = counterclockwise(dir)
        p4 = CartesianIndex(0, 0)
        done .= false
        while true ## 3.3
            p4 = move(p3, image, moved, dir_delta)
            if p4[1] != 0
                break
            end
            done[moved] = true
            moved = counterclockwise(moved)
        end
        push!(border, p3) ## 3.4
        if p3[1] == size(image, 1) || done[3]
            image[p3] = -nbd
        elseif image[p3] == 1
            image[p3] = nbd
        end

        if (p4 == p0 && p3 == p1) ## 3.5
            break
        end
        p2 = p3
        p3 = p4
    end
end


function find_contours(image)
    nbd = 1
    lnbd = 1
    image = Float64.(image)
    contour_list = Vector{typeof(CartesianIndex[])}()
    done = [false, false, false, false, false, false, false, false]

    # Clockwise Moore neighborhood.
    dir_delta = [
        CartesianIndex(-1, 0),
        CartesianIndex(-1, 1),
        CartesianIndex(0, 1),
        CartesianIndex(1, 1),
        CartesianIndex(1, 0),
        CartesianIndex(1, -1),
        CartesianIndex(0, -1),
        CartesianIndex(-1, -1),
    ]

    height, width = size(image)

    for i = 1:height
        lnbd = 1
        for j = 1:width
            fji = image[i, j]
            is_outer = (image[i, j] == 1 && (j == 1 || image[i, j-1] == 0)) ## 1 (a)
            is_hole = (image[i, j] >= 1 && (j == width || image[i, j+1] == 0))

            if is_outer || is_hole
                # 2
                border = CartesianIndex[]

                from = CartesianIndex(i, j)

                if is_outer
                    nbd += 1
                    from -= CartesianIndex(0, 1)

                else
                    nbd += 1
                    if fji > 1
                        lnbd = fji
                    end
                    from += CartesianIndex(0, 1)
                end

                p0 = CartesianIndex(i, j)
                detect_move(image, p0, from, nbd, border, done, dir_delta) ## 3
                if isempty(border) ##TODO
                    push!(border, p0)
                    image[p0] = -nbd
                end
                push!(contour_list, border)
            end
            if fji != 0 && fji != 1
                lnbd = abs(fji)
            end

        end
    end
    return contour_list
end

# a contour is a vector of 2 int arrays
function draw_contour(image, color, contour)
    for ind in contour
        image[ind] = color
    end
end
function draw_contours(image, color, contours)
    for cnt in contours
        draw_contour(image, color, cnt)
    end
end

println("Step 1: open camera")
f = VideoIO.opencamera()

println("Step 2: Set original image shower")

img = read(f)
fig = Figure(size =(200,200), title = "Hand Gesture Recognition")
ax2 = GLMakie.Axis(
    fig[1, 1],
    aspect = DataAspect(),
    xlabel = "x label",
    ylabel = "y label",
    title = "Resultant Image",
    backgroundcolor = :gray80,
)

hidedecorations!(ax2)
node2 = Node(rotr90(imresize(img[:,1:480], ratio=1/2)))
makieimg2 = image!(ax2, node2)
display(fig)
while isopen(f)
    img1 = read(f)
    data = [20, 255, 255, 0, 20, 70, 10]

    img1,num = objecttracker(imresize(img1[:,1:480], ratio=1/2))
    # println(num)
    z = convert(Array{RGB24},img1')
    img1 = CairoImageSurface(z)
    Drawing(img1.width, img1.height, :png)
    placeimage(img1, 0, 0)
    sethue("red")
    fontsize(10)
    if num != -1
        Luxor.text("$(num[1]+1)", Luxor.Point(10, 10), halign=:center)
    end
    img2 = image_as_matrix()
    node2[] = rotr90(img2)
    # node2[] = rotr90(img1)
    sleep(1 / VideoIO.framerate(f))
    # sleep(0.2)
end

Jan 15 '22 11:01 ashwanirathee

I've hit the performance degradation on larger window sizes of the above code as well. I've dabbled with @rafaqz approach, but found it hard to get right/performant. I would certainly be interested in a generic version.

For now, over at GeoArrayOps, I've found some nice performance for large window sizes because my maximum/minimum filters are separable (2d = multiple 1d operations) or by repeating a smaller windows (for diamond shaped windows).

Feb 01 '22 10:02 evetion

Im refactoring the neighborhoods in DynamicGrids.jl to be a Neighborhoods.jl package for similar purposes to yours (e.g. slope filters).

It should be a lot faster than mapwindow here, I would hope at least 10x. For really large window sizes it will slow down too, at some point you need an FFT.

Feb 01 '22 11:02 rafaqz

ImageFiltering.jl ImageFiltering.jl copied to clipboard

possible performance improvement to `mapwindow`

ImageFiltering.jl
ImageFiltering.jl copied to clipboard