DiskArrays.jl icon indicating copy to clipboard operation
DiskArrays.jl copied to clipboard

ArgumentError: Unable to determine chunksize of non-range views.

Open Balinus opened this issue 1 year ago • 20 comments

Hello,

I am trying to see if I am doing something wrong here. This piece of code worked a couple of months ago and no longer works. I tried on multiple dataset/cube and I have the same error.

Thanks!

using YAXArrays
using Zarr
using DimensionalData
using Dates
path="gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710/"
store = zopen(path, consolidated=true)
ds = open_dataset(store)

ds.tas[Ti=Where(x-> Dates.monthday(x) != (2,29))]
ERROR: ArgumentError: Unable to determine chunksize of non-range views.
Stacktrace:
  [1] eachchunk_view(::DiskArrays.Chunked{DiskArrays.ChunkRead{…}}, vv::SubArray{Float32, 3, ZArray{…}, Tuple{…}, false})
    @ DiskArrays ~/.julia/packages/DiskArrays/6JA8Z/src/subarray.jl:29
  [2] eachchunk(a::DiskArrays.SubDiskArray{Float32, 3, ZArray{…}, Tuple{…}, false})
    @ DiskArrays ~/.julia/packages/DiskArrays/6JA8Z/src/subarray.jl:25
  [3] rebuild(A::YAXArray{…}, data::DiskArrays.SubDiskArray{…}, dims::Tuple{…}, refdims::Tuple{}, name::DimensionalData.NoName, metadata::Dict{…})
    @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/b5XBB/src/Cubes/Cubes.jl:201
  [4] rebuild
    @ ~/.julia/packages/DimensionalData/GaADx/src/array/array.jl:85 [inlined]
  [5] rebuildsliced
    @ ~/.julia/packages/DimensionalData/GaADx/src/array/array.jl:100 [inlined]
  [6] rebuildsliced
    @ ~/.julia/packages/DimensionalData/GaADx/src/array/array.jl:99 [inlined]
  [7] view
    @ ~/.julia/packages/DimensionalData/GaADx/src/array/indexing.jl:125 [inlined]
  [8] _dim_view
    @ ~/.julia/packages/DimensionalData/GaADx/src/array/indexing.jl:110 [inlined]
  [9] #view#110
    @ ~/.julia/packages/DimensionalData/GaADx/src/array/indexing.jl:81 [inlined]
 [10] getindex(::YAXArray{Float32, 3, ZArray{…}, Tuple{…}, Dict{…}}; kwargs::@Kwargs{Ti::Where{…}})
    @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/b5XBB/src/Cubes/Cubes.jl:488
 [11] top-level scope
    @ REPL[33]:1
Some type information was truncated. Use `show(err)` to see complete types.



Balinus avatar Aug 07 '24 15:08 Balinus

What versions of the packages are you on?

felixcremer avatar Aug 07 '24 16:08 felixcremer

edit - See messages below this one for relevant information.

I am on latest versions for all packages (DimensionalData, YAXArrays, DiskArrays). I tried with older version, close to last spring 2024, but somehow, I still haven't found where it is breaking yet...

I will try to look closely at the date when the code worked (i.e. https://github.com/JuliaDataCubes/YAXArrays.jl/issues/357)

Here's some configurations I tried so far:

# latest
  [0703355e] DimensionalData v0.27.6
  [3c3547ce] DiskArrays v0.4.4
  [c21b50f5] YAXArrays v0.5.10
⌅ [0703355e] DimensionalData v0.26.8
⌃ [3c3547ce] DiskArrays v0.4.3
⌃ [c21b50f5] YAXArrays v0.5.5
⌅ [0703355e] DimensionalData v0.26.8
  [3c3547ce] DiskArrays v0.4.4
⌃ [c21b50f5] YAXArrays v0.5.5
  [0703355e] DimensionalData v0.27.6
⌃ [3c3547ce] DiskArrays v0.4.2
⌃ [c21b50f5] YAXArrays v0.5.6
  [0703355e] DimensionalData v0.27.6
⌃ [3c3547ce] DiskArrays v0.4.2
⌃ [c21b50f5] YAXArrays v0.5.7
  [0703355e] DimensionalData v0.27.6
⌃ [3c3547ce] DiskArrays v0.4.2
  [c21b50f5] YAXArrays v0.5.10
  [0703355e] DimensionalData v0.27.6
⌃ [3c3547ce] DiskArrays v0.4.3
  [c21b50f5] YAXArrays v0.5.10

Balinus avatar Aug 07 '24 17:08 Balinus

I found versions where it works!

ds.tas[Ti=Where(x-> Dates.monthday(x) != (2,29))]
384×192×251120 YAXArray{Float32,3} with dimensions:
  Dim{:lon} Sampled{Float64} 0.0:0.9375:359.0625 ForwardOrdered Regular Points,
  Dim{:lat} Sampled{Float64} Float64[-89.28422753251364, -88.35700351866494, …, 88.35700351866494, 89.28422753251364] ForwardOrdered Irregular Points,
  Ti Sampled{DateTime} DateTime[2015-01-01T03:00:00, …, 2101-01-01T00:00:00] ForwardOrdered Irregular Points
units: K
name: tas
Total size: 68.97 GB

Versions are:

⌅ [0703355e] DimensionalData v0.25.8
⌅ [3c3547ce] DiskArrays v0.3.23
⌃ [c21b50f5] YAXArrays v0.5.3

Balinus avatar Aug 07 '24 17:08 Balinus

Due to Zarr and DimensionalData requirements in the MWE, I am unable to install [email protected] to see if it is the version the break the code or if this is due to YAXArrays going to v0.5.4

Balinus avatar Aug 07 '24 17:08 Balinus

Zarr was causing the req incompatibilities. I tried with a NetCDF file with the following versions and I get the error. So, either from DiskArrays going to v0.4.0 or YAXArrays going to v0.5.4

Env with error:

⌅ [0703355e] DimensionalData v0.26.8
⌃ [3c3547ce] DiskArrays v0.4.0
  [30363a11] NetCDF v0.12.0
⌃ [c21b50f5] YAXArrays v0.5.4

Balinus avatar Aug 07 '24 17:08 Balinus

Most likely DiskArrays 0.4, it's a major reworking of some indexing internals. @meggart will know

rafaqz avatar Aug 07 '24 22:08 rafaqz

This comes from changes to eachchunk I managed to reduce this to a DiskArrays problem on DiskArrays master:

using DiskArrays
using DiskArrays.TestTypes
julia> a = TestTypes.ChunkedDiskArray(rand(100,100),(10,10))
100×100 ChunkedDiskArray{Float64, 2, Matrix{Float64}}

Chunked: (
    [10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
    [10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
)


julia> v = view(a, [1,2,4],:);

julia> eachchunk(v)
ERROR: ArgumentError: Unable to determine chunksize of non-range views.
Stacktrace:
 [1] eachchunk_view(::DiskArrays.Chunked{…}, vv::SubArray{…})
   @ DiskArrays ~/.julia/dev/DiskArrays/src/subarray.jl:29
 [2] eachchunk(a::DiskArrays.SubDiskArray{Float64, 2, ChunkedDiskArray{…}, Tuple{…}, false})
   @ DiskArrays ~/.julia/dev/DiskArrays/src/subarray.jl:25
 [3] top-level scope
   @ REPL[17]:1
Some type information was truncated. Use `show(err)` to see complete types.

felixcremer avatar Aug 08 '24 13:08 felixcremer

I am not sure what is the reason we have this check there and whether we could try to forward this to the unchunked access of the data.

felixcremer avatar Aug 08 '24 13:08 felixcremer

This might already be solved in #181

felixcremer avatar Aug 08 '24 13:08 felixcremer

Great, will keep an eye on latest updates from DiskArrays.jl and report back when it is updated. Thanks!

Balinus avatar Aug 08 '24 13:08 Balinus

Is there a way I can test the #181 commit? I tried with the commit (add DiskArrays#5ef1432d4e925cf550c1cfdd3e083eca80db1fe9 and add DiskArrays#5ef1432), but didn't worked. Perhaps it is because the commit is from another repo? (I never tried to pin a package with a specific commit).

I'd like to test the commit to see if it is resolved. Thanks!

Balinus avatar Aug 09 '24 13:08 Balinus

Its in https://github.com/ConnectedSystems/DiskArrays.jl fix-index-issue branch

You probably have to manually clone it?

rafaqz avatar Aug 09 '24 13:08 rafaqz

Its in https://github.com/ConnectedSystems/DiskArrays.jl fix-index-issue branch

You probably have to manually clone it?

ah! And then do a dev? I'll try to so how hard it is (I guess I would need to dev YAXArrays too and change the [deps] section)

Balinus avatar Aug 09 '24 14:08 Balinus

no you can just add it and it will work: ] add https://github.com/ConnectedSystems/DiskArrays.jl#fix-index-issue

or of cource git clone and then dev

rafaqz avatar Aug 09 '24 14:08 rafaqz

It works! 😄

ds.tas[Ti=Where(x-> Dates.monthday(x) != (2,29))]
╭────────────────────────────────────╮
│ 384×192×251120 YAXArray{Float32,3} │
├────────────────────────────────────┴─────────────────────────────────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 0.0:0.9375:359.0625 ForwardOrdered Regular Points,
  → lat Sampled{Float64} [-89.28422753251364, -88.35700351866494, …, 88.35700351866494, 89.28422753251364] ForwardOrdered Irregular Points,
  ↗ Ti  Sampled{DateTime} [2015-01-01T03:00:00, …, 2101-01-01T00:00:00] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 10 entries:
  "units"         => "K"
  "history"       => "2019-07-21T06:26:13Z altered by CMOR: Treated scalar dimension: 'height'. 201…
  "name"          => "tas"
  "cell_methods"  => "area: mean time: point"
  "cell_measures" => "area: areacella"
  "long_name"     => "Near-Surface Air Temperature"
  "coordinates"   => "height"
  "standard_name" => "air_temperature"
  "_FillValue"    => 1.0f20
  "comment"       => "near-surface (usually, 2 meter) air temperature"
├─────────────────────────────────────────────────────────────────────────────────────── file size ┤
  file size: 68.97 GB
  [0703355e] DimensionalData v0.27.6
  [3c3547ce] DiskArrays v0.4.5 `https://github.com/ConnectedSystems/DiskArrays.jl#fix-index-issue`
  [30363a11] NetCDF v0.12.0
  [c21b50f5] YAXArrays v0.5.10
  [0a941bbe] Zarr v0.9.4

@rafaqz Note that if I did dev DiskArrays after add ...ConnectedSystems..., the dev version was reverting back to 0.4.4. Only adding the version from the ConnectedSystem was sufficient to test the commit. Thanks again!

Balinus avatar Aug 09 '24 16:08 Balinus

@rafaqz I have a side-question for this working example (the MWE provided in this thread: https://github.com/JuliaDataCubes/YAXArrays.jl/issues/357, reproduced below). I am trying to rebuild the array (i.e. updating the dimensions values), but I have trouble understanding how I can do it. Any pointer would be helpful!


using YAXArrays
using Zarr
using DimensionalData
using Dates
path="gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710/"
store = zopen(path, consolidated=true)
ds = open_dataset(store)

# Selecting only entry without Feb 29th.
ds_subset = ds.tas[Ti=Where(x-> Dates.monthday(x) != (2,29))]

# Taking and reinterpreting the timevector
date_vec = lookup(ds_subset, dimt)
# New time vector
datevec_noleap = CFTime.reinterpret.(DateTimeNoLeap, date_vec)

Then I'd like to rebuild the dimensions :Ti with datevec_noleap, for example, I can rebuild with

rebuild(:Ti, datevec_noleap)
newdim = Ti [DateTimeNoLeap(2001-01-01T00:00:00), …, DateTimeNoLeap(2010-12-31T00:00:00)]

However, I still haven't been able to assign this new rebuilded dim to ds_subset

Balinus avatar Aug 09 '24 20:08 Balinus

DD is mostly straight functional code... You have to rebuild constructed objects and assign them to a new variable.

In this case, set is a rebuild helper. So you can do new_array = set(some_dd_array, Ti => datevec_noleap) and set will figure out how to rebuild it correctly.

You can put pretty much any dimensions of lookup property after the => as there is no ambiguity as to what you mean to do.

rafaqz avatar Aug 09 '24 20:08 rafaqz

Nice, will give a try monday morning 😄

Thanks a lot!

Balinus avatar Aug 09 '24 20:08 Balinus

DD is mostly straight functional code... You have to rebuild constructed objects and assign them to a new variable.

In this case, set is a rebuild helper. So you can do new_array = set(some_dd_array, Ti => datevec_noleap) and set will figure out how to rebuild it correctly.

You can put pretty much any dimensions of lookup property after the => as there is no ambiguity as to what you mean to do.

It works nicely, thanks!

Balinus avatar Aug 12 '24 18:08 Balinus

Its in ConnectedSystems/DiskArrays.jl fix-index-issue branch

You probably have to manually clone it?

You can also checkout the PR branch locally from a normally dev'ed DiskArrays via this tutorial https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

felixcremer avatar Aug 12 '24 20:08 felixcremer

Fixed by #181

Balinus avatar Oct 22 '24 12:10 Balinus