YAXArrays.jl icon indicating copy to clipboard operation
YAXArrays.jl copied to clipboard

Error in infering input dimensions during mapCube of a Dataset

Open danlooo opened this issue 4 months ago • 0 comments

I want to call mapCube on all Variables of a Dataset within the same Zarr store at once, e.g. converting bands red, green, and blue in parallel. One can apply mapCube on each Array separately. However, they share some input and output dimensions so that I want to put them into the same Zarr Dataset store, writing the data directly to outdims while skipping additional copying of savedataset.

Unfortunately,

using YAXArrays
using DimensionalData
a = rand(X(1:10), Y(1:5)) |> x -> YAXArray(x.data)
b = rand(X(1:10), Y(1:5)) |> x -> YAXArray(x.data)
ds = Dataset(a=a, b=b)
res = mapCube(
    ds;
    indims=(InDims(), InDims()),
    outdims=OutDims(Ti(1:10); path=tempname(), backend=:zarr),
) do xin, xout
    xout .= 42
end

results into error:

ERROR: type Tuple has no field axisdesc
Stacktrace:
 [1] getproperty
   @ ./Base.jl:49 [inlined]
 [2] mapCube(::Function, ::Dataset; indims::Tuple{…}, outdims::OutDims, inplace::Bool, kwargs::@Kwargs{})
   @ YAXArrays.DAT ~/prj/YAXArrays.jl/src/DAT/DAT.jl:339
 [3] top-level scope
   @ REPL[12]:1

Notably, we get the same error after converting the Dataset into a tuple of YAXArrays:

using YAXArrays, Zarr
using YAXArrays: YAXArrays as YAX
using Dates

f(lo, la, t) = (lo + la + Dates.dayofyear(t))

function g(xout, lo, la, t)
    xout .= f.(lo, la, t)
end

lat_yax = YAXArray(lat(range(1, 10)))
lon_yax = YAXArray(lon(range(1, 15)))
tspan = Date("2022-01-01"):Day(1):Date("2022-01-30")
time_yax = YAXArray(YAX.time(tspan))

gen_cube = mapCube(g, (lon_yax, lat_yax, time_yax);
           indims = (InDims(), InDims(), InDims("time")),
           outdims = OutDims("time", overwrite=true, path="my_gen_cube.zarr", backend=:zarr,
           outtype = Float32)
       )
ds_t = Dataset(; r = lat_yax, g = lon_yax, t = time_yax )
gen_cube_ds = mapCube(g, ds_t;
    indims = (InDims(), InDims(), InDims("time")),
    outdims = OutDims("time", overwrite=true, path="my_gen_cube.zarr", backend=:zarr,
    outtype = Float32)
)

The corresponding method does not have unit tests.

Workaround

Create and save skeleton of dataset and fill it later with set index in parallel see YAXArrays and xarrays documentation.

danlooo avatar Jul 17 '25 14:07 danlooo