YAXArrays.jl icon indicating copy to clipboard operation
YAXArrays.jl copied to clipboard

subsetting cube by Lon, lat and other variables fails

Open lazarusA opened this issue 3 years ago • 5 comments

The following fails at several levels:

If I do the following

axlist = [
    RangeAxis("other", 1:4),
    RangeAxis("time", range(1, 20, length=20)),
    RangeAxis("lon", range(-66.4598,-66.4598, length=1)),
    RangeAxis("lat", range(-33.4648, -33.4648, length=1)),
    ]
data = rand(4, 20, 1, 1)
ds = YAXArray(axlist, data)

note that

ds.lat
1-element RangeAxis{Float64, :lat, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}:
 -33.4648

and doing the following will fail:

subsetcube(ds, lon = -66.4598)
ERROR: InexactError: trunc(Int64, NaN)
Stacktrace:
 [1] trunc
   @ ./float.jl:805 [inlined]
 [2] round
   @ ./float.jl:369 [inlined]
 [3] #axVal2Index#1
   @ ~/.julia/packages/YAXArrays/swXW2/src/Cubes/Axes.jl:203 [inlined]
 [4] interpretsubset(subexpr::Float64, ax::RangeAxis{Float64, :lon, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}})
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/swXW2/src/Cubes/Cubes.jl:269
 [5] (::YAXArrays.Cubes.var"#14#18"{YAXArray{Float64, 4, Array{Float64, 4}, Vector{RangeAxis}}, Vector{Any}})(kw::Pair{Any, Any})
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/swXW2/src/Cubes/Cubes.jl:297
 [6] foreach(f::YAXArrays.Cubes.var"#14#18"{YAXArray{Float64, 4, Array{Float64, 4}, Vector{RangeAxis}}, Vector{Any}}, itr::Dict{Any, Any})
   @ Base ./abstractarray.jl:2694
 [7] _subsetcube(z::YAXArray{Float64, 4, Array{Float64, 4}, Vector{RangeAxis}}, subs::Vector{Any}; kwargs::Base.Pairs{Symbol, Float64, Tuple{Symbol}, NamedTuple{(:lon,), Tuple{Float64}}})
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/swXW2/src/Cubes/Cubes.jl:289
 [8] subsetcube(z::YAXArray{Float64, 4, Array{Float64, 4}, Vector{RangeAxis}}; kwargs::Base.Pairs{Symbol, Float64, Tuple{Symbol}, NamedTuple{(:lon,), Tuple{Float64}}})
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/swXW2/src/Cubes/Cubes.jl:258
 [9] top-level scope

the other issue is that doing

subsetcube(ds, other = (1,2))
YAXArray with the following dimensions
other               Axis with 1 Elements from 1 to 1
time                Axis with 20 Elements from 1.0 to 20.0
lon                 Axis with 1 Elements from -66.4598 to -66.4598
lat                 Axis with 1 Elements from -33.4648 to -33.4648
Total size: 160.0 bytes

will subset only the first element, but doing

subsetcube(ds, other = (1,3))
YAXArray with the following dimensions
other               Axis with 3 Elements from 1 to 3
time                Axis with 20 Elements from 1.0 to 20.0
lon                 Axis with 1 Elements from -66.4598 to -66.4598
lat                 Axis with 1 Elements from -33.4648 to -33.4648
Total size: 480.0 bytes

will subset 3 elements, and I cannot do the first 2 elements.

and doing

subsetcube(ds, other = (1,4))

will still give back still only 3 elements.

YAXArray with the following dimensions
other               Axis with 3 Elements from 1 to 3
time                Axis with 20 Elements from 1.0 to 20.0
lon                 Axis with 1 Elements from -66.4598 to -66.4598
lat                 Axis with 1 Elements from -33.4648 to -33.4648
Total size: 480.0 bytes

As for the first issue, of course doing:

axlist = [
    RangeAxis("other", 1:4),
    RangeAxis("time", range(1, 20, length=20)),
    RangeAxis("lon", [-66.4598]),
    RangeAxis("lat", [-33.4648]),
    ]
data = rand(4, 20, 1, 1)
ds = YAXArray(axlist, data)
ds.lon
subsetcube(ds, lon = -66.4598, lat = -33.4648)

works, however sometimes we don't have control over the YAX generation and having the type StepRangeLen will be unavoidable.

YAXArray with the following dimensions
other               Axis with 4 Elements from 1 to 4
time                Axis with 20 Elements from 1.0 to 20.0
Total size: 640.0 bytes

lazarusA avatar Sep 21 '22 15:09 lazarusA

This is the error message for subsetting a Timeaxis:

ERROR: MethodError: no method matching convert_time(::Type{DateTime}, ::Vector{DateTime})
Closest candidates are:
  convert_time(::Type{<:TimeType}, ::Date) at ~/.julia/packages/YAXArrays/Ly8sc/src/Cubes/Axes.jl:281
  convert_time(::Type{<:TimeType}, ::TimeType) at ~/.julia/packages/YAXArrays/Ly8sc/src/Cubes/Axes.jl:279
Stacktrace:
 [1] axVal2Index(a::RangeAxis{DateTime, :time, Vector{DateTime}}, v::Vector{DateTime}; fuzzy::Bool)
   @ YAXArrays.Cubes.Axes ~/.julia/packages/YAXArrays/Ly8sc/src/Cubes/Axes.jl:197
 [2] interpretsubset(subexpr::Vector{DateTime}, ax::RangeAxis{DateTime, :time, Vector{DateTime}})
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/Ly8sc/src/Cubes/Cubes.jl:234
 [3] (::YAXArrays.Cubes.var"#14#18"{YAXArray{Float32, 4, DiskArrayTools.DiskArrayStack{Float32, 4, DiskArrays.SubDiskArray{Float32, 3}, 1}, Vector{CubeAxis}}, Vector{Any}})(kw::Pair{Any, Any})                                                                             
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/Ly8sc/src/Cubes/Cubes.jl:262
 [4] foreach(f::YAXArrays.Cubes.var"#14#18"{YAXArray{Float32, 4, DiskArrayTools.DiskArrayStack{Float32, 4, DiskArrays.SubDiskArray{Float32, 3}, 1}, Vector{CubeAxis}}, Vector{Any}}, itr::Dict{Any, Any})                                                                    
   @ Base ./abstractarray.jl:2694
 [5] _subsetcube(z::YAXArray{Float32, 4, DiskArrayTools.DiskArrayStack{Float32, 4, DiskArrays.SubDiskArray{Float32, 3}, 1}, Vector{CubeAxis}}, subs::Vector{Any}; kwargs::Base.Pairs{Symbol, Vector{DateTime}, Tuple{Symbol}, NamedTuple{(:time,), Tuple{Vector{DateTime}}}})
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/Ly8sc/src/Cubes/Cubes.jl:254
 [6] subsetcube(z::YAXArray{Float32, 4, DiskArrayTools.DiskArrayStack{Float32, 4, DiskArrays.SubDiskArray{Float32, 3}, 1}, Vector{CubeAxis}}; kwargs::Base.Pairs{Symbol, Vector{DateTime}, Tuple{Symbol}, NamedTuple{(:time,), Tuple{Vector{DateTime}}}})                    
   @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/Ly8sc/src/Cubes/Cubes.jl:223
 [7] #getindex#22
   @ ~/.julia/packages/YAXArrays/Ly8sc/src/Cubes/Cubes.jl:281 [inlined]
 [8] top-level scope
   @ REPL[154]:1

jcortesr avatar Sep 22 '22 07:09 jcortesr

As discussed, using DimensionalData.jl solves the issue of indexing.

using DimensionalData

axlist = [
    RangeAxis("other", 1:4),
    RangeAxis("time", range(1, 20, length=20)),
    RangeAxis("lon", range(-66.4598,-66.4598, length=1)),
    RangeAxis("lat", range(-33.4648, -33.4648, length=1)),
    ]
data = rand(4, 20, 1, 1)
ds = YAXArray(axlist, data)

dim = yaxconvert(DimArray, ds)

subset = dim[lon=At(-66.4598), lat = At(-33.4648)]
yax = yaxconvert(YAXArray, subset)
YAXArray with the following dimensions
other               Axis with 4 Elements from 1 to 4
time                Axis with 20 Elements from 1.0 to 20.0
Total size: 640.0 bytes

lazarusA avatar Sep 23 '22 08:09 lazarusA

Thanks a lot @lazarusA. Your solution works like a charm!

dpabon avatar Feb 13 '23 10:02 dpabon

is also in the docs hopefully soon will be part of the package, so we don't need to do those extra steps.

lazarusA avatar Feb 13 '23 12:02 lazarusA

Ok, I just found out that subset = dim[lon=At(-66.4598), lat = At(-33.4648)] is not lazy. This is a problem when working with a huge dataset as the Earth System Data Cube.

dpabon avatar Feb 13 '23 14:02 dpabon