Images.jl icon indicating copy to clipboard operation
Images.jl copied to clipboard

move `var`, `std` to ColorVectorSpace

Open johnnychen94 opened this issue 5 years ago • 12 comments
trafficstars

The right place for var and std should be ColorVectorSpace

https://github.com/JuliaImages/Images.jl/blob/2176612df7e863e534af66d146227523d0685715/src/algorithms.jl#L79-L97

Steps to achieve it:

  1. copy codes to ColorVectorSpaces, merge and ask for a new release
  2. remove codes here: since ColorVectorSpace is loaded in this package, we don't need to deprecate them. But we need some compatibility-checking codes so that if an older version of ColorVectorSpace is installed, this package still functions normally.

johnnychen94 avatar Feb 11 '20 18:02 johnnychen94

https://github.com/JuliaGraphics/ColorVectorSpace.jl/blob/master/src/ColorVectorSpace.jl#L266 @johnnychen94 is it good to add these two function here?

dipak101505 avatar Feb 12 '20 17:02 dipak101505

It's just a matter of taste, I would choose https://github.com/JuliaGraphics/ColorVectorSpace.jl/blob/f2baf1d3e3c9769020692abe747289899d08f35e/src/ColorVectorSpace.jl#L210-L211 since they all belong to Statistics 😄

johnnychen94 avatar Feb 12 '20 17:02 johnnychen94

I think this makes more sense than having them here. Isn't defining them here technically type piracy? Whereas ColorVectorSpace is the agreed place to add these operations.

Tokazama avatar Feb 12 '20 20:02 Tokazama

If you do this, why not move complement (#698) within the new version?

kimikage avatar Feb 15 '20 12:02 kimikage

Since the current implementation of var uses the channelview, there is a discussion on where and how var should be implemented. https://github.com/JuliaGraphics/ColorVectorSpace.jl/pull/125#discussion_r389336991

I'm not a dedicated developer of JuliaImages packages:sweat_smile:, so I'm waiting for the advice of core developers or users.

kimikage avatar Mar 09 '20 15:03 kimikage

I thought handling the keyword argument dims was annoying (cf. https://github.com/JuliaGraphics/ColorVectorSpace.jl/pull/125#discussion_r389491458). However, in the first place, I noticed that dims has not been supported.:sweat:

julia> using Images, FixedPointNumbers, Statistics

julia> var(rand(RGB{N0f8},2,3))
RGB{Float64}(0.06165884916057927,0.09422017172882223,0.014523644752018458)

julia> var(rand(RGB{N0f8},2,3), dims=1)
ERROR: MethodError: no method matching RGB(::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2})

If we don't have to support dims now, var can be implemented as follows:

using ColorVectorSpace: sumtype

Statistics.var(A::AbstractArray{C}; corrected::Bool=true, mean=nothing, dims=:) where C <: Colorant = 
    _var(A, corrected, mean, dims)

Statistics.varm(A::AbstractArray{C}, mean; dims=:, corrected::Bool=true) where C <: Colorant =
    _varm(A, mean, corrected, dims)

_var(A::AbstractArray{C}, corrected::Bool, mean, dims::Colon) where C <: Colorant =
    _varm(A, something(mean, Statistics.mean(A)), corrected, dims)

function _varm(A::AbstractArray{C}, mean, corrected, dims::Colon) where C <: Colorant
    n = length(A)
    T = sumtype(C, typeof(mean)) # ?
    n == 0 && return nan(base_colorant_type(C){T})
    mapreduce(c->mapc(abs2, c - mean), +, A) / (n - Int(corrected))
end

Current implementation in Images.jl:

julia> img = rand(RGB{N0f8},100,100);

julia> @benchmark var(view($img,:,:))
BenchmarkTools.Trial:
  memory estimate:  560 bytes
  allocs estimate:  14
  --------------
  minimum time:     281.900 μs (0.00% GC)
  median time:      283.299 μs (0.00% GC)
  mean time:        290.398 μs (0.00% GC)
  maximum time:     644.901 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Yet another implementation:

julia> @benchmark var(view($img,:,:))
BenchmarkTools.Trial:
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     21.499 μs (0.00% GC)
  median time:      21.600 μs (0.00% GC)
  mean time:        21.684 μs (0.00% GC)
  maximum time:     78.099 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

kimikage avatar Mar 10 '20 12:03 kimikage

The dims support has not been fully tested and optimized.

_var(A::AbstractArray{C}, corrected::Bool, mean, dims) where C <: Colorant =
    _varm(A, something(mean, Statistics.mean(A, dims=dims)), corrected, dims)

function _varm(A::AbstractArray{C}, mean, corrected, dims) where C <: Colorant
    n = length(A)
    T = sumtype(C, eltype(mean)) # ?
    if n == 0
        nans = similar(mean, base_colorant_type(C){T})
        return fill!(nans, nan(base_colorant_type(C){T}))
    end
    y = iterate(mean)
    function s(slice)
        s = mapreduce(c->mapc(abs2, c - y[1]), +, slice)
        y = iterate(mean, y[2])
        s / (length(slice)- Int(corrected))
    end
    mapslices(s, A, dims=dims)
end

kimikage avatar Mar 10 '20 12:03 kimikage

BTW, I also noticed inconsistencies with the element types.

julia> mean(img)
RGB{Float64}(0.4960086274509804,0.5011768627450981,0.5030243137254904)

julia> mean(img, dims=1)
1×100 Array{RGB{Float32},2} with eltype RGB{Float32}:
 RGB{Float32}(0.510353,0.535608,0.455804)  …  RGB{Float32}(0.523176,0.478235,0.511608)

I think Float32 is more appropriate in this case.

kimikage avatar Mar 10 '20 12:03 kimikage

From my first glance, this looks great, thanks for putting these together. I'll check it more carefully this weekend if nobody does.

Actually, I'm more interested in improving the performance of colorview and channelview. For example, the WIP benchmark tells me that getindex of channelview is 20x slower than a plain array. 😕

johnnychen94 avatar Mar 10 '20 13:03 johnnychen94

This is off-topic, but @timholy and @chriselrod are trying to improve the vectorization of RGB image processing. The memory layout, i.e. Struct of Arrays vs. Array of Structs, is a worrisome problem in image processing. IMO, modern CPUs (with AVX2) have reduced alignment issues and improved shuffling/permuting performance, so the current AoS layout is more versatile. So, I think improving channelview has great merits.

kimikage avatar Mar 10 '20 16:03 kimikage

cf. https://github.com/JuliaGraphics/ColorVectorSpace.jl/issues/126

kimikage avatar Mar 12 '20 15:03 kimikage

Agreed that it's off-topic, but your points are important and I'll briefly address them here. (If we want to continue the discussion, to avoid derailing this PR too much let's open another issue.) EDIT: oops, I thought this was https://github.com/JuliaGraphics/ColorVectorSpace.jl/pull/125, hence my somewhat confusing comment here!

I don't have immediate plans to change the default layout, because I too see many advantages to the current system. For ImageFiltering, I think "we can have our cake and eat it too" (weird English expression :smile:): because each array element will be used many times (as many as there are elements in the kernel), it's totally worth it to change the representation used in the TileBuffer from the TiledIteration.jl package. But that would be for a temporary intermediate used to improve cache efficiency in a small block of the image, it's not something that will escape to the user.

There are other circumstances, like resizing/rotating/etc, where the number of uses per array element is much smaller, and for which the appropriate vectorization strategy is less clear. I don't yet have any concrete plans about how to handle that.

For improving channelview, my current favorite strategy is to introduce the possibility of declaring some array dimensions as having a fixed size known to the compiler. If I'm thinking about it clearly, that should make ReinterpretArray perform much better. This would essentially introduce a statically-sized array type to Base, though different from StaticArrays.jl in that one could declare a subset of the dimensions to be statically-sized. (For channelview the one I'm interested in is the first dimension, of size 3.) Of course, it's far from clear that this PR, when I get around to it, will be accepted into Base.

timholy avatar Mar 12 '20 16:03 timholy