YAXArrays.jl
YAXArrays.jl copied to clipboard
Feature request: Save YAXArray or Dataset into a Zarr group
Multiple Datasets in the Common Data Model V4 can be stored in the same file. Hereby, they are organized in (nested) groups, analog to files in directories and subdirectories.
For example, xarray.Dataset.to_zarr has the option group to specify the path inside the zarr storage in which the dataset should be stored.
Similarily, zarr.hierarchy.group has the option path to specify the (group) path. The prototype (and part of xarray roadmap) xarray-datatree uses this to represent a tree of Datasets as its own type. I think it is already implemented in Zarr.jl function Zarr.zcreate in option name.
This is of particular importance when it comes to store data cubes of different spatio-temporal resolutions in the same store. I'd be great to have an additional group option to the function savedataset and savecube.
data cubes of different spatio-temporal resolutions
https://juliadatacubes.github.io/YAXArrays.jl/dev/examples/generated/UserGuide/creating/#creating-a-dataset
isn't this case already. You can always pass bunch of YAXArrays of different dimensions into a dataset that can be saved as a .zarr file, or?
Datasets are to store multiple variables sampled over the same grid defined by their shared axes. However, the e.g. spatial axes of different resolutions are not the same. Trying this:
using YAXArrays
using Zarr
high_res_cube = YAXArray(rand(10, 10, 3))
low_res_cube = YAXArray(rand(5, 5, 3))
ds = Dataset(high_res = high_res_cube, low_res = low_res_cube)
savedataset(ds; path = "foo.zarr", driver=:zarr)
also returns an error when it comes to saving the dataset on disk:
ERROR: ArgumentError: Can not construct YAXArray, supplied data size is (10, 10, 3) while axis lenghts are (5, 5, 3)
Stacktrace:
[1] YAXArray(axes::Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, data::ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, properties::Dict{String, Any}, chunks::DiskArrays.GridChunks{3}, cleaner::Vector{YAXArrays.Cubes.CleanMe})
@ YAXArrays.Cubes ~/.julia/packages/YAXArrays/R6KY3/src/Cubes/Cubes.jl:110
[2] #YAXArray#5
@ ~/.julia/packages/YAXArrays/R6KY3/src/Cubes/Cubes.jl:129 [inlined]
[3] collectfromhandle(e::NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}, dshandle::YAXArrayBase.ZarrDataset, cleaner::Vector{YAXArrays.Cubes.CleanMe})
@ YAXArrays.Datasets ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:403
[4] #102
@ ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:564 [inlined]
[5] iterate
@ ./generator.jl:47 [inlined]
[6] collect_to!(dest::Vector{YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, offs::Int64, st::Int64)
@ Base ./array.jl:840
[7] collect_to_with_first!(dest::Vector{YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}}, v1::YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, st::Int64)
@ Base ./array.jl:818
[8] _collect(c::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, #unused#::Base.EltypeUnknown, isz::Base.HasShape{1})
@ Base ./array.jl:812
[9] collect_similar(cont::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}})
@ Base ./array.jl:711
[10] map(f::Function, A::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}})
@ Base ./abstractarray.jl:3261
[11] savedataset(ds::Dataset; path::String, persist::Nothing, overwrite::Bool, append::Bool, skeleton::Bool, backend::Symbol, driver::Symbol, max_cache::Float64, writefac::Float64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ YAXArrays.Datasets ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:564
[12] top-level scope
@ REPL[20]:1