Images.jl
Images.jl copied to clipboard
move `var`, `std` to ColorVectorSpace
The right place for var and std should be ColorVectorSpace
https://github.com/JuliaImages/Images.jl/blob/2176612df7e863e534af66d146227523d0685715/src/algorithms.jl#L79-L97
Steps to achieve it:
- copy codes to
ColorVectorSpaces, merge and ask for a new release - remove codes here: since ColorVectorSpace is loaded in this package, we don't need to deprecate them. But we need some compatibility-checking codes so that if an older version of
ColorVectorSpaceis installed, this package still functions normally.
https://github.com/JuliaGraphics/ColorVectorSpace.jl/blob/master/src/ColorVectorSpace.jl#L266 @johnnychen94 is it good to add these two function here?
It's just a matter of taste, I would choose https://github.com/JuliaGraphics/ColorVectorSpace.jl/blob/f2baf1d3e3c9769020692abe747289899d08f35e/src/ColorVectorSpace.jl#L210-L211 since they all belong to Statistics 😄
I think this makes more sense than having them here. Isn't defining them here technically type piracy? Whereas ColorVectorSpace is the agreed place to add these operations.
If you do this, why not move complement (#698) within the new version?
Since the current implementation of var uses the channelview, there is a discussion on where and how var should be implemented.
https://github.com/JuliaGraphics/ColorVectorSpace.jl/pull/125#discussion_r389336991
I'm not a dedicated developer of JuliaImages packages:sweat_smile:, so I'm waiting for the advice of core developers or users.
I thought handling the keyword argument dims was annoying (cf. https://github.com/JuliaGraphics/ColorVectorSpace.jl/pull/125#discussion_r389491458).
However, in the first place, I noticed that dims has not been supported.:sweat:
julia> using Images, FixedPointNumbers, Statistics
julia> var(rand(RGB{N0f8},2,3))
RGB{Float64}(0.06165884916057927,0.09422017172882223,0.014523644752018458)
julia> var(rand(RGB{N0f8},2,3), dims=1)
ERROR: MethodError: no method matching RGB(::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2})
If we don't have to support dims now, var can be implemented as follows:
using ColorVectorSpace: sumtype
Statistics.var(A::AbstractArray{C}; corrected::Bool=true, mean=nothing, dims=:) where C <: Colorant =
_var(A, corrected, mean, dims)
Statistics.varm(A::AbstractArray{C}, mean; dims=:, corrected::Bool=true) where C <: Colorant =
_varm(A, mean, corrected, dims)
_var(A::AbstractArray{C}, corrected::Bool, mean, dims::Colon) where C <: Colorant =
_varm(A, something(mean, Statistics.mean(A)), corrected, dims)
function _varm(A::AbstractArray{C}, mean, corrected, dims::Colon) where C <: Colorant
n = length(A)
T = sumtype(C, typeof(mean)) # ?
n == 0 && return nan(base_colorant_type(C){T})
mapreduce(c->mapc(abs2, c - mean), +, A) / (n - Int(corrected))
end
Current implementation in Images.jl:
julia> img = rand(RGB{N0f8},100,100);
julia> @benchmark var(view($img,:,:))
BenchmarkTools.Trial:
memory estimate: 560 bytes
allocs estimate: 14
--------------
minimum time: 281.900 μs (0.00% GC)
median time: 283.299 μs (0.00% GC)
mean time: 290.398 μs (0.00% GC)
maximum time: 644.901 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
Yet another implementation:
julia> @benchmark var(view($img,:,:))
BenchmarkTools.Trial:
memory estimate: 48 bytes
allocs estimate: 1
--------------
minimum time: 21.499 μs (0.00% GC)
median time: 21.600 μs (0.00% GC)
mean time: 21.684 μs (0.00% GC)
maximum time: 78.099 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
The dims support has not been fully tested and optimized.
_var(A::AbstractArray{C}, corrected::Bool, mean, dims) where C <: Colorant =
_varm(A, something(mean, Statistics.mean(A, dims=dims)), corrected, dims)
function _varm(A::AbstractArray{C}, mean, corrected, dims) where C <: Colorant
n = length(A)
T = sumtype(C, eltype(mean)) # ?
if n == 0
nans = similar(mean, base_colorant_type(C){T})
return fill!(nans, nan(base_colorant_type(C){T}))
end
y = iterate(mean)
function s(slice)
s = mapreduce(c->mapc(abs2, c - y[1]), +, slice)
y = iterate(mean, y[2])
s / (length(slice)- Int(corrected))
end
mapslices(s, A, dims=dims)
end
BTW, I also noticed inconsistencies with the element types.
julia> mean(img)
RGB{Float64}(0.4960086274509804,0.5011768627450981,0.5030243137254904)
julia> mean(img, dims=1)
1×100 Array{RGB{Float32},2} with eltype RGB{Float32}:
RGB{Float32}(0.510353,0.535608,0.455804) … RGB{Float32}(0.523176,0.478235,0.511608)
I think Float32 is more appropriate in this case.
From my first glance, this looks great, thanks for putting these together. I'll check it more carefully this weekend if nobody does.
Actually, I'm more interested in improving the performance of colorview and channelview. For example, the WIP benchmark tells me that getindex of channelview is 20x slower than a plain array. 😕
This is off-topic, but @timholy and @chriselrod are trying to improve the vectorization of RGB image processing. The memory layout, i.e. Struct of Arrays vs. Array of Structs, is a worrisome problem in image processing.
IMO, modern CPUs (with AVX2) have reduced alignment issues and improved shuffling/permuting performance, so the current AoS layout is more versatile. So, I think improving channelview has great merits.
cf. https://github.com/JuliaGraphics/ColorVectorSpace.jl/issues/126
Agreed that it's off-topic, but your points are important and I'll briefly address them here. (If we want to continue the discussion, to avoid derailing this PR too much let's open another issue.) EDIT: oops, I thought this was https://github.com/JuliaGraphics/ColorVectorSpace.jl/pull/125, hence my somewhat confusing comment here!
I don't have immediate plans to change the default layout, because I too see many advantages to the current system. For ImageFiltering, I think "we can have our cake and eat it too" (weird English expression :smile:): because each array element will be used many times (as many as there are elements in the kernel), it's totally worth it to change the representation used in the TileBuffer from the TiledIteration.jl package. But that would be for a temporary intermediate used to improve cache efficiency in a small block of the image, it's not something that will escape to the user.
There are other circumstances, like resizing/rotating/etc, where the number of uses per array element is much smaller, and for which the appropriate vectorization strategy is less clear. I don't yet have any concrete plans about how to handle that.
For improving channelview, my current favorite strategy is to introduce the possibility of declaring some array dimensions as having a fixed size known to the compiler. If I'm thinking about it clearly, that should make ReinterpretArray perform much better. This would essentially introduce a statically-sized array type to Base, though different from StaticArrays.jl in that one could declare a subset of the dimensions to be statically-sized. (For channelview the one I'm interested in is the first dimension, of size 3.) Of course, it's far from clear that this PR, when I get around to it, will be accepted into Base.