MeshArrays.jl icon indicating copy to clipboard operation
MeshArrays.jl copied to clipboard

how to save regularly tiled MeshArrays data to files as a collection of compressed chunks "a la zarr"?

Open gaelforget opened this issue 10 months ago • 5 comments

  • see https://github.com/gaelforget/MITgcm.jl/issues/143 in relation to this
  • a lazy read would be step 2
  • a partial read based on lat-non ranges would be step 3

gaelforget avatar Jan 07 '25 17:01 gaelforget

using NCDatasets, Glob, Climatology
lst_E=glob("ETAN*nc",ScratchSpaces.ECCO*"/ETAN")
lst_M=glob("MXLDEPTH*nc",ScratchSpaces.ECCO*"/MXLDEPTH")
ds_E=NCDataset(lst_E,aggdim="tile",isnewdim=true)
ds_M=NCDataset(lst_M,aggdim="tile",isnewdim=true)
ds=merge(ds_E,ds_M)

gaelforget avatar Jan 07 '25 17:01 gaelforget

Not sure how to convert https://github.com/JuliaClimate/MeshArrays.jl/issues/166#issuecomment-2575897457 to Rasters.jl

The following returns an error, but it's unclear how to resolve it based on the error message or the docs of Rasters.jl

This works :

using Rasters, Glob, Climatology, NCDatasets
get_ecco_variable_if_needed("ETAN")
lst_E=glob("ETAN*nc",ScratchSpaces.ECCO*"/ETAN")
RasterStack(lst_E[1])

and returns :

┌ 90×90×12 RasterStack ┐
├──────────────────────┴────────────────────────────────────────────────────────────────────── dims ┐
  ↓ i3 Sampled{Float64} [1.0, 2.0, …, 89.0, 90.0] ForwardOrdered Regular Points,
  → i2 Sampled{Float64} [1.0, 2.0, …, 89.0, 90.0] ForwardOrdered Regular Points,
  ↗ i1 Sampled{Float64} [1.0, 2.0, …, 11.0, 12.0] ForwardOrdered Regular Points
├─────────────────────────────────────────────────────────────────────────────────────────── layers ┤
  :area    eltype: Float64 dims: i3, i2 size: 90×90
  :land    eltype: Float64 dims: i3, i2 size: 90×90
  :lat     eltype: Float64 dims: i3, i2 size: 90×90
  :lon     eltype: Float64 dims: i3, i2 size: 90×90
  :tim     eltype: Dates.DateTime dims: i1 size: 12
  :timstep eltype: Float64 dims: i1 size: 12
  :ETAN    eltype: Float32 dims: i3, i2, i1 size: 90×90×12
├───────────────────────────────────────────────────────────────────────────────────────── metadata ┤
  Metadata{Rasters.NCDsource} of Dict{String, Any} with 11 entries:
  "history"       => " files revision history :\n    2015/04/29 : convert main variable type to sing…
  "NCO"           => "4.3.7"
  "Format"        => "native grid (nctiles w. 13 tiles)"
  "references"    => "Forget, G., J.-M. Campin, P. Heimbach, C. N. Hill, R. M. Ponte, \n and C. Wuns…
  "date"          => "01-Apr-2016"
  "_FillValue"    => NaN
  "Conventions"   => "CF-1.6"
  "missing_value" => NaN
  "source"        => "ECCO consortium (http://ecco-group.org/)"
  "title"         => "ETAN -- ECCO v4 ocean state estimate, release 2 -- climatology"
  "institution"   => "MIT"
├─────────────────────────────────────────────────────────────────────────────────────────── raster ┤
  extent: Extent(i3 = (1.0, 90.0), i2 = (1.0, 90.0), i1 = (1.0, 12.0))
  missingval: missing
└───────────────────────────────────────────────────────────────────────────────────────────────────┘

but none of these work :


RasterStack(dirname(lst_E[1]))

returns the error :

ERROR: NetCDF error: Variable 'ETAN.0001' not found in file /Users/gaelforget/.julia/scratchspaces/9e9a4d37-2d2e-41e3-8b85-f7978328d9c7/ECCO/ETAN/ETAN.0001.nc (NetCDF error code: -49)
Stacktrace:
  [1] nc_inq_varid(ncid::Int32, name::Symbol)
    @ NCDatasets ~/.julia/packages/NCDatasets/LjXBn/src/netcdf_c.jl:1558
  [2] _variable
    @ ~/.julia/packages/NCDatasets/LjXBn/src/variable.jl:72 [inlined]
  [3] variable
    @ ~/.julia/packages/NCDatasets/LjXBn/src/variable.jl:84 [inlined]
  [4] cfvariable(ds::NCDataset{Nothing, Missing}, varname::Symbol)
    @ CommonDataModel ~/.julia/packages/CommonDataModel/vTYnt/src/cfvariable.jl:86
  [5] getindex
    @ ~/.julia/packages/CommonDataModel/vTYnt/src/dataset.jl:170 [inlined]
  [6] _open(f::Rasters.var"#61#62"{…}, ::Rasters.NCDsource, ds::NCDataset{…}; name::Symbol, group::Rasters.NoKW, kw::@Kwargs{})
    @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/sources/commondatamodel.jl:128
  [7] _open
    @ ~/.julia/packages/Rasters/C1Hvd/src/sources/commondatamodel.jl:126 [inlined]
  [8] Raster(ds::NCDataset{…}, filename::String; dims::Rasters.NoKW, refdims::Tuple{}, name::Symbol, group::Rasters.NoKW, metadata::Rasters.NoKW, missingval::Rasters.NoKW, crs::Rasters.NoKW, mappedcrs::Rasters.NoKW, source::Rasters.NCDsource, replace_missing::Bool, write::Bool, lazy::Bool, dropband::Bool, checkmem::Bool)
    @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/array.jl:324
  [9] Raster
    @ ~/.julia/packages/Rasters/C1Hvd/src/array.jl:306 [inlined]
 [10] #58
    @ ~/.julia/packages/Rasters/C1Hvd/src/array.jl:303 [inlined]
 [11] _open(f::Rasters.var"#58#59"{…}, ::Rasters.NCDsource, ds::NCDataset{…}; name::Rasters.NoKW, group::Nothing, kw::@Kwargs{})
    @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/sources/commondatamodel.jl:129
 [12] _open
    @ ~/.julia/packages/Rasters/C1Hvd/src/sources/commondatamodel.jl:126 [inlined]
 [13] #8
    @ ~/.julia/packages/Rasters/C1Hvd/ext/RastersNCDatasetsExt/ncdatasets_source.jl:67 [inlined]
 [14] NCDataset(::RastersNCDatasetsExt.var"#8#9"{…}, ::String, ::Vararg{…}; kwargs::@Kwargs{})
    @ NCDatasets ~/.julia/packages/NCDatasets/LjXBn/src/dataset.jl:252
 [15] NCDataset(::Function, ::String, ::Vararg{String})
    @ NCDatasets ~/.julia/packages/NCDatasets/LjXBn/src/dataset.jl:249
 [16] _open(f::Function, ::Rasters.NCDsource, filename::String; write::Bool, kw::@Kwargs{})
    @ RastersNCDatasetsExt ~/.julia/packages/Rasters/C1Hvd/ext/RastersNCDatasetsExt/ncdatasets_source.jl:66
 [17] _open
    @ ~/.julia/packages/Rasters/C1Hvd/ext/RastersNCDatasetsExt/ncdatasets_source.jl:63 [inlined]
 [18] #_open#21
    @ ~/.julia/packages/Rasters/C1Hvd/src/sources/sources.jl:86 [inlined]
 [19] _open
    @ ~/.julia/packages/Rasters/C1Hvd/src/sources/sources.jl:85 [inlined]
 [20] #Raster#57
    @ ~/.julia/packages/Rasters/C1Hvd/src/array.jl:302 [inlined]
 [21] (::Rasters.var"#96#99"{Rasters.NoKW, @Kwargs{…}})(name::Symbol, fn::String, mv::Rasters.NoKW, md::Rasters.NoKW)
    @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/stack.jl:365
 [22] map(::Function, ::NTuple{13, Symbol}, ::NTuple{13, String}, ::NTuple{13, Rasters.NoKW}, ::Vararg{NTuple{…}})
    @ Base ./tuple.jl:406
 [23] RasterStack(filenames::@NamedTuple{…}; source::Rasters.NoKW, missingval::Rasters.NoKW, metadata::Rasters.NoKW, resize::Rasters.NoKW, layermetadata::Rasters.NoKW, layerdims::Rasters.NoKW, kw::@Kwargs{…})
    @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/stack.jl:364
 [24] RasterStack(filenames::Vector{…}; name::Vector{…}, kw::@Kwargs{…})
    @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/stack.jl:351
 [25] RasterStack
    @ ~/.julia/packages/Rasters/C1Hvd/src/stack.jl:346 [inlined]
 [26] RasterStack(filename::String; lazy::Bool, dropband::Bool, replace_missing::Bool, source::Rasters.NoKW, name::Rasters.NoKW, group::Rasters.NoKW, kw::@Kwargs{})
    @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/stack.jl:392
 [27] RasterStack(filename::String)
    @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/stack.jl:370
 [28] top-level scope
    @ REPL[12]:1
Some type information was truncated. Use `show(err)` to see complete types.

And I get the same error with RasterStack(lst_E).


Then I tried the following

aggregate([Raster(f) for f in lst_E])
ERROR: MethodError: no method matching aggregate(::Vector{Raster{…}})
The function `aggregate` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  aggregate(::Any, ::DimensionalData.Dimensions.Lookups.Regular, ::Any)
   @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/methods/aggregate.jl:100
  aggregate(::Any, ::DimensionalData.Dimensions.Lookups.Span, ::Any)
   @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/methods/aggregate.jl:99
  aggregate(::Any, ::DimensionalData.Dimensions.Lookups.Lookup, ::Any)
   @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/methods/aggregate.jl:87

and

RasterSeries(lst_E)
ERROR: MethodError: no method matching RasterSeries(::Vector{String})
The type `RasterSeries` exists, but no method is defined for this combination of argument types when trying to construct it.

Closest candidates are:
  RasterSeries(::A, ::D, ::R) where {T, N, A<:AbstractArray{T, N}, D<:Tuple, R<:Tuple}
   @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/series.jl:127
  RasterSeries(::AbstractArray{<:Union{AbstractString, NamedTuple}}, ::Any; refdims, lazy, duplicate_first, child, resize, kw...)
   @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/series.jl:142
  RasterSeries(::DimensionalData.AbstractBasicDimArray; kw...)
   @ Rasters ~/.julia/packages/Rasters/C1Hvd/src/series.jl:132
  ...

ping @rafaqz

gaelforget avatar Jan 08 '25 15:01 gaelforget

Hard to say not knowing what the errors are or what is in the vector

rafaqz avatar Jan 08 '25 15:01 rafaqz

Hard to say not knowing what the errors are or what is in the vector

I have added the error messages above. The ERROR: NetCDF error: Variable 'ETAN.0001' not found in file is the one I want to fix really. The relevant variable isETANin all files notETAN.0001etc. The goal is to concatenate the various 90x90 tiles along a new dimension (which we calledtile` in https://github.com/JuliaClimate/MeshArrays.jl/issues/166#issuecomment-2575897457).

gaelforget avatar Jan 08 '25 16:01 gaelforget

For RasterSeries you need to pass some dimensions that those files are over as the second argument

rafaqz avatar Jan 08 '25 18:01 rafaqz