YAXArrays.jl
YAXArrays.jl copied to clipboard
KeyError: key :Ti not found
Hello, I have the following error when using YAXArrays.Datasets.open_mfdataset. The files represent daily data (1st file is day 1, 2nd file is second day, etc). It is ERA5-Land data downloaded from Copernicus (I do not have the downloading script sadly).
using NetCDF
using YAXArrays
using Glob
repbrut = "/path/to/files"
patterns = "*copernicus_era5_land_surface.nc"
files = glob(patterns, repbrut)
obs = YAXArrays.Datasets.open_mfdataset(files[1:10]) # loading only a subset of the 3000 files
KeyError: key :Ti not found
Stacktrace:
[1] getindex
@ ./dict.jl:498 [inlined]
[2] _broadcast_getindex_evalf
@ ./broadcast.jl:709 [inlined]
[3] _broadcast_getindex
@ ./broadcast.jl:682 [inlined]
[4] #31
@ ./broadcast.jl:1118 [inlined]
[5] ntuple
@ ./ntuple.jl:50 [inlined]
[6] copy
@ ./broadcast.jl:1118 [inlined]
[7] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getindex), Tuple{Base.RefValue{Dict{Symbol, Any}}, Tuple{Symbol, Symbol, Symbol}}})
@ Base.Broadcast ./broadcast.jl:903
[8] merge_datasets(dslist::Vector{YAXArrays.Datasets.Dataset})
@ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:903
[9] open_mfdataset(g::Vector{String})
@ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:280
[10] top-level scope
@ In[4]:5
I can open the files individually, for example:
ds1 = open_dataset(files[1])
YAXArray Dataset
Shared Axes:
↓ longitude Sampled{Float32} -82.0f0:0.1f0:-50.0f0 ForwardOrdered Regular Points,
→ latitude Sampled{Float32} 64.0f0:-0.1f0:42.0f0 ReverseOrdered Regular Points,
↗ Ti Sampled{DateTime} [1950-01-01T00:00:00, …, 1950-01-01T23:00:00] ForwardOrdered Irregular Points
Variables:
snowc, e, skt, asn, d2m, stl1, t2m, lai_lv, u10, sro, ssrd, src, v10, lai_hv, sp, sd, rsn, evaow, sde, sf, tp, ro,
Properties: Dict{String, Any}("history" => "2024-05-10 22:57:09 GMT by grib_to_netcdf-2.28.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data2/adaptor.mars.internal-1715381827.014326-19889-5-81cba0ea-74b0-4995-b5c3-8458c0c8abd5.nc /cache/tmp/81cba0ea-74b0-4995-b5c3-8458c0c8abd5-adaptor.mars.internal-1715381789.8065639-19889-3-tmp.grib", "Conventions" => "CF-1.6")
ds2 = open_dataset(files[2])
YAXArray Dataset
Shared Axes:
↓ longitude Sampled{Float32} -82.0f0:0.1f0:-50.0f0 ForwardOrdered Regular Points,
→ latitude Sampled{Float32} 64.0f0:-0.1f0:42.0f0 ReverseOrdered Regular Points,
↗ Ti Sampled{DateTime} [1950-01-02T00:00:00, …, 1950-01-02T23:00:00] ForwardOrdered Irregular Points
Variables:
snowc, e, skt, asn, d2m, stl1, lai_lv, t2m, u10, sro, ssrd, src, v10, lai_hv, sp, sd, rsn, evaow, sde, sf, tp, ro,
Properties: Dict{String, Any}("history" => "2024-05-10 22:54:16 GMT by grib_to_netcdf-2.28.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data3/adaptor.mars.internal-1715381653.851235-8099-3-1961ccb9-cd31-4fe2-b913-5973053f1ab1.nc /cache/tmp/1961ccb9-cd31-4fe2-b913-5973053f1ab1-adaptor.mars.internal-1715381615.6087704-8099-3-tmp.grib", "Conventions" => "CF-1.6")
but I am unable to merge the datasets:
newds = YAXArrays.Datasets.merge_datasets([ds1, ds2])
KeyError: key :Ti not found
Stacktrace:
[1] getindex
@ ./dict.jl:498 [inlined]
[2] _broadcast_getindex_evalf
@ ./broadcast.jl:709 [inlined]
[3] _broadcast_getindex
@ ./broadcast.jl:682 [inlined]
[4] #31
@ ./broadcast.jl:1118 [inlined]
[5] ntuple
@ ./ntuple.jl:50 [inlined]
[6] copy
@ ./broadcast.jl:1118 [inlined]
[7] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getindex), Tuple{Base.RefValue{Dict{Symbol, Any}}, Tuple{Symbol, Symbol, Symbol}}})
@ Base.Broadcast ./broadcast.jl:903
[8] merge_datasets(dslist::Vector{YAXArrays.Datasets.Dataset})
@ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:903
[9] top-level scope
@ In[15]:1
As far as I can tell, :Ti is present in both files here (and in all 3000 files I have), but somehow it does not seems to be able to pick it up.
(Climat) pkg> st
[179af706] CFTime v0.1.3
[a93c6f00] DataFrames v1.6.1
[0703355e] DimensionalData v0.27.2
[31c24e10] Distributions v0.25.108
[85f8d34a] NCDatasets v0.14.4
[30363a11] NetCDF v0.11.8
[90b8fcef] YAXArrayBase v0.6.1
[c21b50f5] YAXArrays v0.5.6
⌃ [0a941bbe] Zarr v0.9.3
[ade2ca70] Dates
[10745b16] Statistics v1.10.0
From Manifest
[fcd2136c] DiskArrayTools v0.1.10
⌅ [3c3547ce] DiskArrays v0.3.23
That is something that we should fix. As a stop gap you could extract all cubes from the dataset use cat(cubes..., dims=Ti) to merge them and wrap the concatenated cubes in a Dataset.
ok, thanks, I'll see what I can do.
I didn't calculated correctly the number of files... it is 27_000 files.
I am doing the following, but I get a warning about lookup tables not matching (the order of the variable name are perhaps not sorted, creating a problem? -> see "t2m" and "lai_lv" in both lists)
cubes = Cube.(files[1:4])
ds2 = cat(cubes..., dims=:Ti);
Warning: Lookup values for Dim{:Variable} of
["snowc", "e", "skt", "asn", "d2m", "stl1", "t2m", "lai_lv", "u10", "sro", "ssrd", "src", "v10", "lai_hv", "sp", "sd", "rsn", "evaow", "sde", "sf", "tp", "ro"]
and
["snowc", "e", "skt", "asn", "d2m", "stl1", "lai_lv", "t2m", "u10", "sro", "ssrd", "src", "v10", "lai_hv", "sp", "sd", "rsn", "evaow", "sde", "sf", "tp", "ro"] do not match. Can't `cat` AbstractDimArray, applying to `parent` object.
└ @ DimensionalData.Dimensions [~/.julia/packages/DimensionalData/yZgLJ/src/Dimensions/primitives.jl:774](https://vscode-remote+ssh-002dremote-002bdl2594-002elogin.vscode-resource.vscode-cdn.net/gpfs/groups/gc095/dl2594/Codes/ExtractionsBassins/Notebooks/~/.julia/packages/DimensionalData/yZgLJ/src/Dimensions/primitives.jl:774)
close by #470, #481. If there are other edge cases, please open a new issue with a MWE.