Images.jl
Images.jl copied to clipboard
Performance issues when trying to implement `regionprops`
I am receiving poor performance in label_components compared to Matlab's bwconncomp or compared openCV C++
my input data is 1600x1200 Uint8 array with a few round blobs, 100-200 pixels each.
in C++ openCV does this at around 1ms on my computer, Matlab is around 3 ms
and Images are anywhere between 11ms to 30ms , which is not sufficient for 60fps rendering
and thus does not solve the "2 language problem" as I need to go back and forth to C++.
Are you using label_components! or label_components?
The first one allocates memory every call and will not be very good for real time stuff.
@time zeros(int 1600,1200)
0.002046 seconds (12 allocations: 14.649 MB)
If it were possible to supply a test file, it would be really helpful.
@davidbp label_components!(L,B) returns method error for me, but I did manage to call it.
It indeed gave around 2x speedup .
So I guess the title should be changed there is probably nothing wrong in labeled_components! performance , just that it is undocumented.
However my final goal is to extract some properties from the labeled image ,like centroid and size.
the equivalent function in Matlab is regionprops, in openCV it would be
findContours and here the number are not so good.
@timholy
using Images
r = 10
n = 40
A = zeros(UInt8,1200,1600);
B = zeros(Bool,1200,1600) #in_place binary image
L = zeros(Int64,1200,1600) #for storing labels in-place
for i=1:n
ci = rand(r:1200-r)
cj = rand(r:1600-r)
A[ci-r:ci+r,cj-r:cj+r] = 150
end
@benchmark B .= A .> 100 #around 170us on my computer
B .= A .> 100
@benchmark L = label_components(B) # 9.4ms median time
label_components!(L,B) #returns method error
@benchmark label_components!(L, B, 1:ndims(B), 0) #3.9ms median time which gets close to 2x from C
@benchmark component_centroids(L) #30ms median time ...way too much :-(
Btw, isn't it possible to do the task on an at least 4x smaller image? Usually that even helps with any recognition task due to a better signal to noise ratio. I remember doing quite finicky image recognition on a 128x128 image, and there weren't many important details lost...
Just saying, not that it wouldn't be nice to still have top performance for these algorithms ;) In any case, I guess with the work that @timholy puts into optimizing most algorithms, only multi threading and GPU accelerations would make things faster... But who knows, maybe there are some easy gains hidden somewhere :)
I know very little about the domain specifics, but the code at https://github.com/JuliaImages/Images.jl/blob/master/src/connected.jl#L275-L288 looks quite old. maybe we should refactor it a bit.
here is one possible suggestion I coded up in the REPL
julia> using Base.Cartesian
julia> @generated function _numer_denom(tup::Vararg{Int,N}) where {N}
quote
@ntuple $(N-1) d -> tup[d] ./ tup[$N]
end
end
_numer_denom (generic function with 1 method)
julia> @generated function component_centroids2(img::AbstractArray{Int,N}) where N
quote
len = length(0:maximum(img))
@nexprs $N d -> numer_d = zeros(Int, len)
denom = zeros(Int, len)
@inbounds for I in CartesianRange(size(img))
v = img[I] + 1
@nexprs $N d -> numer_d[v] += I[d]
denom[v] += 1
end
map(_numer_denom, @ntuple($N,numer)..., denom)
end
end
component_centroids2 (generic function with 1 method)
julia> @btime component_centroids2(L)
5.484 ms (11 allocations: 544 bytes)
1-element Array{Tuple{Float64,Float64},1}:
(600.5, 800.5)
down to 5 ms from 25 ms on my machine. (Plus it gains type stability)
julia> @btime component_centroids(L)
25.387 ms (10 allocations: 608 bytes)
1-element Array{Tuple{Float64,Float64},1}:
(600.5, 800.5)
For the centroid finding itself I need full resolution since I am doing tracking not recognition. So I need highest accuracy , but indeed as @SimonDanisch pointed out, maybe localising blobs on a downscaled image first and then measuring centroid centres using @Evizero code on only on the bounding boxes found in the previous step, will bring me to the desired time frame without calling external shared objects.
Wow, yes, that is ancient code...it clearly predated CartesianRange and @generated functions. Getting rid of the ind2sub would be a huge improvement.
This is also quite "charming":
julia> S = view(B, 2:1000, 2:1500);
julia> label_components(S)
ERROR: MethodError: no method matching label_components!(::Array{Int64,2}, ::SubArray{Bool,2,Array{Bool,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::UnitRange{Int64}, ::Int64)
Closest candidates are:
label_components!(::AbstractArray{Int64,N} where N, ::BitArray, ::Union{AbstractArray{Int64,1}, Tuple{Vararg{Int64,N}} where N}, ::Any) at /home/tim/.julia/v0.6/Images/src/connected.jl:103
label_components!(::AbstractArray{Int64,2}, ::AbstractArray, ::Array{Bool,2}, ::Any) at /home/tim/.julia/v0.6/Images/src/connected.jl:109
label_components!(::AbstractArray{Int64,2}, ::AbstractArray, ::BitArray{2}, ::Any) at /home/tim/.julia/v0.6/Images/src/connected.jl:166
...
Stacktrace:
[1] label_components(::SubArray{Bool,2,Array{Bool,2},Tuple{UnitRange{Int64},UnitRange{Int64}},false}, ::UnitRange{Int64}, ::Int64) at /home/tim/.julia/v0.6/Images/src/connected.jl:29 (repeats 2 times)
This whole file is a living exhibit of how Julia has improved.
So, there are two ways we can do this. One is that I or one of the other main Images contributors can tackle it. The alternative is to coach @TsurHerman through making the necessary changes. The latter might be a fair amount of work for you, but unless you're already very familiar with all the high-performance infrastructure, I predict that you'll emerge out the other side with a different worldview and abilities. Which do you prefer, @TsurHerman?
Note: I personally doubt we need all the Base.Cartesian macros, I suspect that between tuples/StaticVectors and CartesianRange we can simplify this a lot.
Note: I personally doubt we need all the Base.Cartesian macros, I suspect that between tuples/StaticVectors and CartesianRange we can simplify this a lot.
I think so too. To be honest I just quickly hacked up this implementation, because felt like i needed some data to back up my point.
It seems so quaint to believe that words should be connected to actual reality. :grimacing:
would be great to have my 2-year old code improved upon :) i'd be glad to lend a hand. 5x still wouldn't beat the openCV benchmarks in the OP though. maybe they're vectorizing. in general, we should avoid @generated functions, no?
I think some of this is older than that. For example, the "compile a function and cache in a Dict" is what Julia's internal array code looked like in the 0.1 days :smile:.
I'm not categorically opposed to @generated functions but would first like to see whether lispy-tuple implementations are good enough. They almost always are, and usually simpler and more compiler-friendly (esp. for static compilation of julia code, which I expect to have growing importance over the coming years).
Thanks @timholy for all the work you are doing , I would love to contribute but am currently swamped.
I think that in order to come close to openCV performance the design should change. Connected components should be represented using a dedicated data structure .. like a list of all indices as in Matlab or using contours (fastest but hardest to implement) as in OpenCV.
We started exploring image processing using Julia at my work place, I will make an effort to contribute back common operators that are still missing in the eco system
function component_centroids(img::AbstractArray{Int,N}) where N
len = length(0:maximum(img))
n = zeros(Int, N+1, len)
@inbounds for I in CartesianRange(size(img))
v = img[I] + 1
for d in 1:N
n[d, v] += I[d]
end
n[end, v] += 1
end
map(v -> ntuple(d -> (n[d, v] / n[end, v]), Val{N}), 1:len)
end
This is the lowest I get while looking pretty. Each access to n in the inner loop takes away about 1 ms in the timing below, so I think in order to the get lower than 2-3 ms one would probably need to change the algorithm and/or data structure. (I simply tweaked the implementation)
julia> @btime component_centroids($L)
5.843 ms (4 allocations: 256 bytes)
1-element Array{Tuple{Float64,Float64},1}:
(600.5, 800.5)
Thanks @Evizero. I'll put tackling some of the cruftier code (the core label_components! algorithm) on my TODO.