KernelFunctions.jl icon indicating copy to clipboard operation
KernelFunctions.jl copied to clipboard

getindex with ColVecs

Open willtebbutt opened this issue 2 years ago • 3 comments

We currently return a view to the underlying data when getindex is called on a ColVecs. See https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/2d172125e72ca932133266c3b23d0865bec539df/src/utils.jl#L76

@theogf was the last person to touch this bit of code, but I'm not sure who wrote it initially (I don't think it was me). Perhaps @devmotion ?

My concern is that if you do getindex, rather thanview, my intuition is that I wouldn't get a view of the underlying data back, but a copy in a new location in memory. Any thoughts on whether my intuition is off, or should we change this?

willtebbutt avatar Dec 19 '21 13:12 willtebbutt

I think it wasn't me (actually I thought you implemented ColVecs and RowVecs initially @willtebbutt :smile: ) but in my opinion it is correct to return views here. I view ColVecs and RowVecs as an optimized implementation of eachcol and eachrow with the AbstractArray interface that allows for more performant operations by using the underlying matrix. This is also how (hopefully) eachcol and eachrow will be implemented at some point: https://github.com/JuliaLang/julia/pull/32310

devmotion avatar Dec 19 '21 17:12 devmotion

Ah, interesting. How would you propose that view(::ColVecs, idx...) and getindex(::ColVecs, idx...) differ in their output then (assuming that they should)?

willtebbutt avatar Dec 23 '21 15:12 willtebbutt

I think getindex should satisfy the same properties as for Vectors of views:

julia> A = rand(5, 4);

julia> x = [view(A, :, i) for i in axes(A, 2)];

julia> getindex(x, 1)
5-element view(::Matrix{Float64}, :, 1) with eltype Float64:
 0.832858822923345
 0.1486458121118549
 0.20187209965755426
 0.17211775415001118
 0.8813222986233952

julia> getindex(x, 2:3)
2-element Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}:
 [0.75958434210484, 0.28127270898481205, 0.11477999962460317, 0.6093522955819292, 0.30000872694236713]
 [0.0028572719840013194, 0.6307387952125958, 0.25339090743930304, 0.2795101302467987, 0.5211441981223575]

Mainly, it should satisfy

julia> getindex(x, 1) isa eltype(x)
true

julia> getindex(x, 1:2) isa typeof(x)
true

While the first property is satisfied for ColVecs and RowVecs (https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/992b665d0391e60c3bad713e8ebeaef3c4e297ba/src/utils.jl#L74-L75 and https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/992b665d0391e60c3bad713e8ebeaef3c4e297ba/src/utils.jl#L144-L145), the second property is violated by https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/992b665d0391e60c3bad713e8ebeaef3c4e297ba/src/utils.jl#L76 and https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/blob/992b665d0391e60c3bad713e8ebeaef3c4e297ba/src/utils.jl#L146 (the type of the wrapped matrix is changed).

On the other hand, view may return more optimized representations without additional allocations:

julia> view(x, 1)
0-dimensional view(::Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}, 1) with eltype SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{
Base.OneTo{Int64}}, Int64}, true}:
[0.832858822923345, 0.1486458121118549, 0.20187209965755426, 0.17211775415001118, 0.8813222986233952]

julia> view(x, 2:3)
2-element view(::Vector{SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}, 2:3) with eltype SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}:
 [0.75958434210484, 0.28127270898481205, 0.11477999962460317, 0.6093522955819292, 0.30000872694236713]
 [0.0028572719840013194, 0.6307387952125958, 0.25339090743930304, 0.2795101302467987, 0.5211441981223575]

I think it would be reasonable to expect

julia> all(y isa typeof(getindex(x, 1)) for y in view(x, 2:3))
true

However, the exact return type of view is not clearly defined as far as I know. I think a natural definition for non-integer index/indices i would be

Base.view(x::ColVecs, i) = ColVecs(view(x.X, :, i))
Base.view(x::RowVecs, i) = RowVecs(view(x.X, :, i))

I'm not completely sure what to expect from view(::ColVecs, ::Int) though, maybe just the same as getindex(::ColVecs, ::Int) - but then it would not be consistent with the example above in which a 0-dimensional view is returned.

devmotion avatar Dec 28 '21 23:12 devmotion