Rasters.jl icon indicating copy to clipboard operation
Rasters.jl copied to clipboard

Add multi file support for Grib and NetCDF files

Open aasdelat opened this issue 8 months ago • 19 comments

I am aware that Rasters.jl uses GRIBDatasets.jl and NCDatasets.jl in order to deal with Gribs and NetCDF files, respectively.

Recently, the ability to make a single dataset from a list of files, has been added to these two libraries (see this thread).

Is it possible that Rasters.jl takes advantage of this new feature in order to make a Raster or RasterStack (but not a RasterSeries) from a bunch of files?

aasdelat avatar Mar 29 '25 19:03 aasdelat

In yax we have: https://juliadatacubes.github.io/YAXArrays.jl/stable/UserGuide/read.html#open-mfdataset , but I haven't tested it with Grib files.

lazarusA avatar Mar 29 '25 20:03 lazarusA

What does it do exactly?

We have always been able to do merge(RasterStack(filea), RasterStack(fileb)) to get a single RasterStack. How would this be different?

Ok maybe I get it a bit more: its specifically concatenation? In Rasters you can just do cat(stack1, stack2; dims=Ti) and it will lazily concatenate if the stacks are lazy. Is that what you want?

rafaqz avatar Mar 29 '25 20:03 rafaqz

Also how is it different from combine(rasterseries) ? I clearly don't get it...

rafaqz avatar Mar 29 '25 20:03 rafaqz

I will use an example: I have two grib files with the same structure. They have two dimensions: "values", that are integer indices of spatial points, and Ti, that are timestamps. The timestamps of the second file are continuation of the timestamps of the first file. And both files have three variables: longitude, latitude and msl (mean sea level pressure). I can read both files as a RasterSeries: dataset = RasterSeries(input_files, :valid_time, child=RasterStack, lazy=true) Here are structures of both files:

julia> dataset[1]
╭────────────────────────╮
│ 338880×744 RasterStack │
├────────────────────────┴─────────────────────────────────────────────────────────────────────────── dims ┐
  ↓ values,
  → Ti     Sampled{DateTime} [1980-12-01T00:00:00, …, 1980-12-31T23:00:00] ForwardOrdered Irregular Points
├────────────────────────────────────────────────────────────────────────────────────────────────── layers ┤
  :longitude eltype: Union{Missing, Float64} dims: values size: 338880
  :latitude  eltype: Union{Missing, Float64} dims: values size: 338880
  :msl       eltype: Union{Missing, Float64} dims: values, Ti size: 338880×744
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────── metadata ┐
  Metadata{Rasters.GRIBsource} of Dict{String, Any} with 6 entries:
  "edition"           => "1"
  "source"            => "/home/antonio/Documentos/datos/ERA5/ERA5_N320_1h_sl_sfc_NH15S_1978_2023_msl/ERA5_N320_1h_sl_sfc_NH15S_1980-12-01_1980-1…
  "centreDescription" => "European Centre for Medium-Range Weather Forecasts"
  "centre"            => "ecmf"
  "subCentre"         => "0"
  "Conventions"       => "CF-1.7"
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
  extent: Extent(values = (1, 338880), Ti = (DateTime("1980-12-01T00:00:00"), DateTime("1980-12-31T23:00:00")))
  missingval: missing
  filename: 
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

julia> dataset[2]
╭────────────────────────╮
│ 338880×744 RasterStack │
├────────────────────────┴─────────────────────────────────────────────────────────────────────────── dims ┐
  ↓ values,
  → Ti     Sampled{DateTime} [1981-01-01T00:00:00, …, 1981-01-31T23:00:00] ForwardOrdered Irregular Points
├────────────────────────────────────────────────────────────────────────────────────────────────── layers ┤
  :longitude eltype: Union{Missing, Float64} dims: values size: 338880
  :latitude  eltype: Union{Missing, Float64} dims: values size: 338880
  :msl       eltype: Union{Missing, Float64} dims: values, Ti size: 338880×744
├──────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────── metadata ┐
  Metadata{Rasters.GRIBsource} of Dict{String, Any} with 6 entries:
  "edition"           => "1"
  "source"            => "/home/antonio/Documentos/datos/ERA5/ERA5_N320_1h_sl_sfc_NH15S_1978_2023_msl/ERA5_N320_1h_sl_sfc_NH15S_1981-01-01_1981-0…
  "centreDescription" => "European Centre for Medium-Range Weather Forecasts"
  "centre"            => "ecmf"
  "subCentre"         => "0"
  "Conventions"       => "CF-1.7"
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
  extent: Extent(values = (1, 338880), Ti = (DateTime("1981-01-01T00:00:00"), DateTime("1981-01-31T23:00:00")))
  missingval: missing
  filename: 
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Note that the time dimension of the second file is the continuation of the one of the first file. So, I want to have only one RasterStack object that results from the concatenation of both files. If I use combine, I get:

julia> combine(dataset[1:2], dims=:Ti)
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ 2-element DimArray{RasterStack{(:longitude, :latitude, :msl), @NamedTuple{longitude::Union{Missing, Float64}, latitude::Union{Missing, Float64}, msl::Union{Missing, Float64}}, 2, @NamedTuple{longitude::DiskArrays.ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, DimensionalData.Dimensions.Lookups.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Rasters.FileArray{Rasters.GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, DiskArrays.GridChunks{1, Tuple{DiskArrays.RegularChunks}}, DiskArrays.Unchunked}, Symbol, DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}, Missing}}}, latitude::DiskArrays.ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, DimensionalData.Dimensions.Lookups.NoLookup{Base.OneTo{Int64}}}}, Tuple{}, Rasters.FileArray{Rasters.GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, DiskArrays.GridChunks{1, Tuple{DiskArrays.RegularChunks}}, DiskArrays.Unchunked}, Symbol, DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}, Missing}}}, msl::DiskArrays.ConcatDiskArray{Union{Missing, Float64}, 2, Matrix{Raster{Union{Missing, Float64}, 2, Tuple{Dim{:values, DimensionalData.Dimensions.Lookups.NoLookup{Base.OneTo{Int64}}}, Ti{DimensionalData.Dimensions.Lookups.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}}}}, Tuple{}, Rasters.FileArray{Rasters.GRIBsource, Union{Missing, Float64}, 2, Symbol, Nothing, DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}, DiskArrays.Unchunked}, Symbol, DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}, Missing}}}}, Tuple{Dim{:values, DimensionalData.Dimensions.Lookups.NoLookup{Base.OneTo{Int64}}}, Ti{DimensionalData.Dimensions.Lookups.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}}}}, Tuple{}, @NamedTuple{longitude::Tuple{Dim{:values, Colon}}, latitude::Tuple{Dim{:values, Colon}}, msl::Tuple{Dim{:values, Colon}, Ti{Colon}}}, DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}, @NamedTuple{longitude::DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}, latitude::DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}, msl::DimensionalData.Dimensions.Lookups.Metadata{Rasters.GRIBsource, Dict{String, Any}}}, Missing},1} │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── dims ┤
  ↓ valid_time
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
 RasterStack{(:longitude, :latitude, :msl), @NamedTuple{longitude::Union{Missing, Float64}, latitude::Union{Missing, Float64}, msl::Union{Missing, Float64}}, 2, @NamedTuple{longitude::ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, GridChunks{1, Tuple{RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}, latitude::ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, GridChunks{1, Tuple{RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}, msl::ConcatDiskArray{Union{Missing, Float64}, 2, Matrix{Raster{Union{Missing, Float64}, 2, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}, Ti{Sampled{DateTime, Vector{DateTime}, ForwardOrdered, Irregular{Tuple{Nothing, Nothing}}, Points, Metadata{GRIBsource, Dict{String, Any}}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 2, Symbol, Nothing, GridChunks{2, Tuple{RegularChunks, RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}}, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}, Ti{Sampled{DateTime, Vector{DateTime}, ForwardOrdered, Irregular{Tuple{Nothing, Nothing}}, Points, Metadata{GRIBsource, Dict{String, Any}}}}}, Tuple{}, @NamedTuple{longitude::Tuple{Dim{:values, Colon}}, latitude::Tuple{Dim{:values, Colon}}, msl::Tuple{Dim{:values, Colon}, Ti{Colon}}}, Metadata{GRIBsource, Dict{String, Any}}, @NamedTuple{longitude::Metadata{GRIBsource, Dict{String, Any}}, latitude::Metadata{GRIBsource, Dict{String, Any}}, msl::Metadata{GRIBsource, Dict{String, Any}}}, Missing}((longitude = 338880-element ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, GridChunks{1, Tuple{RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}, latitude = 338880-element ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, GridChunks{1, Tuple{RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}, msl = 338880×744 ConcatDiskArray{Union{Missing, Float64}, 2, Matrix{Raster{Union{Missing, Float64}, 2, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}, Ti{Sampled{DateTime, Vector{DateTime}, ForwardOrdered, Irregular{Tuple{Nothing, Nothing}}, Points, Metadata{GRIBsource, Dict{String, Any}}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 2, Symbol, Nothing, GridChunks{2, Tuple{RegularChunks, RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}), ↓ values,
→ Ti     Sampled{DateTime} [1980-12-01T00:00:00, …, 1980-12-31T23:00:00] ForwardOrdered Irregular Points, (), (longitude = ↓ values, latitude = ↓ values, msl = ↓ values, → Ti), Metadata{GRIBsource, Dict{String, Any}}(Dict{String, Any}("edition"=>"1", "source"=>"/home/antonio/Documentos/datos/ERA5/ERA5_N320_1h_sl_sfc_NH15S_1978_2023_msl/ERA5_N320_1h_sl_sfc_NH15S_1980-12-01_1980-12-31_msl.grib", "centreDescription"=>"European Centre for Medium-Range Weather Forecasts", "centre"=>"ecmf", "subCentre"=>"0", "Conventions"=>"CF-1.7")), (longitude = Metadata{GRIBsource, Dict{String, Any}}(Dict{String, Any}("units"=>"degrees_east", "long_name"=>"longitude", "standard_name"=>"longitude")), latitude = Metadata{GRIBsource, Dict{String, Any}}(Dict{String, Any}("units"=>"degrees_north", "long_name"=>"latitude", "standard_name"=>"latitude")), msl = Metadata{GRIBsource, Dict{String, Any}}(Dict{String, Any}("units"=>"Pa", "long_name"=>"Mean sea level pressure", "standard_name"=>"air_pressure_at_mean_sea_level"))), missing)
 RasterStack{(:longitude, :latitude, :msl), @NamedTuple{longitude::Union{Missing, Float64}, latitude::Union{Missing, Float64}, msl::Union{Missing, Float64}}, 2, @NamedTuple{longitude::ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, GridChunks{1, Tuple{RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}, latitude::ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, GridChunks{1, Tuple{RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}, msl::ConcatDiskArray{Union{Missing, Float64}, 2, Matrix{Raster{Union{Missing, Float64}, 2, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}, Ti{Sampled{DateTime, Vector{DateTime}, ForwardOrdered, Irregular{Tuple{Nothing, Nothing}}, Points, Metadata{GRIBsource, Dict{String, Any}}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 2, Symbol, Nothing, GridChunks{2, Tuple{RegularChunks, RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}}, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}, Ti{Sampled{DateTime, Vector{DateTime}, ForwardOrdered, Irregular{Tuple{Nothing, Nothing}}, Points, Metadata{GRIBsource, Dict{String, Any}}}}}, Tuple{}, @NamedTuple{longitude::Tuple{Dim{:values, Colon}}, latitude::Tuple{Dim{:values, Colon}}, msl::Tuple{Dim{:values, Colon}, Ti{Colon}}}, Metadata{GRIBsource, Dict{String, Any}}, @NamedTuple{longitude::Metadata{GRIBsource, Dict{String, Any}}, latitude::Metadata{GRIBsource, Dict{String, Any}}, msl::Metadata{GRIBsource, Dict{String, Any}}}, Missing}((longitude = 338880-element ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, GridChunks{1, Tuple{RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}, latitude = 338880-element ConcatDiskArray{Union{Missing, Float64}, 1, Vector{Raster{Union{Missing, Float64}, 1, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 1, Symbol, Nothing, GridChunks{1, Tuple{RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}, msl = 338880×744 ConcatDiskArray{Union{Missing, Float64}, 2, Matrix{Raster{Union{Missing, Float64}, 2, Tuple{Dim{:values, NoLookup{OneTo{Int64}}}, Ti{Sampled{DateTime, Vector{DateTime}, ForwardOrdered, Irregular{Tuple{Nothing, Nothing}}, Points, Metadata{GRIBsource, Dict{String, Any}}}}}, Tuple{}, FileArray{GRIBsource, Union{Missing, Float64}, 2, Symbol, Nothing, GridChunks{2, Tuple{RegularChunks, RegularChunks}}, Unchunked}, Symbol, Metadata{GRIBsource, Dict{String, Any}}, Missing}}}), ↓ values,
→ Ti     Sampled{DateTime} [1981-01-01T00:00:00, …, 1981-01-31T23:00:00] ForwardOrdered Irregular Points, (), (longitude = ↓ values, latitude = ↓ values, msl = ↓ values, → Ti), Metadata{GRIBsource, Dict{String, Any}}(Dict{String, Any}("edition"=>"1", "source"=>"/home/antonio/Documentos/datos/ERA5/ERA5_N320_1h_sl_sfc_NH15S_1978_2023_msl/ERA5_N320_1h_sl_sfc_NH15S_1981-01-01_1981-01-31_msl.grib", "centreDescription"=>"European Centre for Medium-Range Weather Forecasts", "centre"=>"ecmf", "subCentre"=>"0", "Conventions"=>"CF-1.7")), (longitude = Metadata{GRIBsource, Dict{String, Any}}(Dict{String, Any}("units"=>"degrees_east", "long_name"=>"longitude", "standard_name"=>"longitude")), latitude = Metadata{GRIBsource, Dict{String, Any}}(Dict{String, Any}("units"=>"degrees_north", "long_name"=>"latitude", "standard_name"=>"latitude")), msl = Metadata{GRIBsource, Dict{String, Any}}(Dict{String, Any}("units"=>"Pa", "long_name"=>"Mean sea level pressure", "standard_name"=>"air_pressure_at_mean_sea_level"))), missing)

And I do not understand what this is.

On the other hand, I can use cat. But, logically, cat will concatenate only the variable with a time dimension: msl, but neither longitude nor latitude. So, first, I have to select the variable that will be concatenated along the time asis (msl). Second, I have to pass to cat, the new (time)axis values, id est, I have to concatenate, first, the time axes of the two files:

time_name = :Ti
var_name = :msl
lon_name = :longitude
lat_name = :latitude

new_time_dim = cat(
    [ lookup(dataset[i], time_name) for i in 1:length(dataset)]...,
    dims=1
    )#cat
new_ds_msl = cat(
    [ dataset[i][var_name] for i in 1:length(dataset)]..., # We have selected the :msl variable
    dims=Ti(new_time_dim)
    )#cat
# We have now the msl variable (or Raster), in new_ds_msl (a Raster object), but I have to
# reincorporate longitude and latitude, resulting a RasterStack object:
dataset = RasterStack(
    ( dataset[1][lon_name], dataset[1][lat_name], new_ds_var ),
    name = (lon_name, lat_name, var_name)
    )#RasterStack

And I have what I need:

julia> dataset
╭─────────────────────────╮
│ 338880×1488 RasterStack │
├─────────────────────────┴────────────────────────────────────────────────────────────────────────── dims ┐
  ↓ values,
  → Ti     Sampled{DateTime} [1980-12-01T00:00:00, …, 1981-01-31T23:00:00] ForwardOrdered Irregular Points
├────────────────────────────────────────────────────────────────────────────────────────────────── layers ┤
  :longitude eltype: Union{Missing, Float64} dims: values size: 338880
  :latitude  eltype: Union{Missing, Float64} dims: values size: 338880
  :msl       eltype: Union{Missing, Float64} dims: values, Ti size: 338880×1488
├────────────────────────────────────────────────────────────────────────────────────────────────── raster ┤
  extent: Extent(values = (1, 338880), Ti = (DateTime("1980-12-01T00:00:00"), DateTime("1981-01-31T23:00:00")))
  missingval: missing
  filename: 
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

And I think that you will tell me: "You have what you want, so What's the problem then?" The problem is that this all staff is a bit cumbersome and should be simplified in a simple function like xarray.open_mfdataset. This makes the code more readable and easier to be maintained.

So, although I can do what I want with the current implementation, I think it would be very advisable to have a xarray.open_mfdataset like function to make things simpler and more intuitive.

aasdelat avatar Mar 29 '25 22:03 aasdelat

Huh.

It seems the reason this doesn't work easily is because your data (lat, lon) is really a dimension, not data. This needs to load as a Vector data cube with a point lookup. (We can do this but it's not yet being detected on loading the data)

Then the points would be the same for both stacks, and cat would just work on the time dimension as I suggested.

You could also use spatial selectors, which you currently can't do.

(@asinghvi17 here's a simple vector cube example in the wild)

Do you have a link to the data?

(And there is no way I would say that much code is ok! This should all be one or two lines. The question is which lines... I don't want to encourage dimensions as data like you have, youre losing plotting and half the tools in the package with that setup)

rafaqz avatar Mar 29 '25 23:03 rafaqz

@aasdelat a link to that file would really help identify how we can automatically load it as a dimension.

I'm not sure if its following CF point standards, or if it is custom data that is just organised like that.

rafaqz avatar Apr 01 '25 17:04 rafaqz

Also of note is :

https://github.com/rafaqz/Rasters.jl/issues/917

We can just accept already loaded datasets from CDM.jl and turn them into RasterStacks

rafaqz avatar Apr 02 '25 16:04 rafaqz

FYI you can probably just do Raster(multifile) and RasterStack(multifile) now that those constructors on AbstractDataset are merged.

rafaqz avatar Apr 22 '25 20:04 rafaqz

Sorry form my (so long) delay. I will answer in order:

The files are 0,5 Gb each and are retrieved from the Copernicus Data Store. You have to be registered in order to download data from here (and also you have to know what data you want and how to retrieve them), but I share you the two files by my Mega account: https://mega.nz/file/4gZE3A4K#xWFXWlHil2lAM5NpNAC-WTSEIa6PhYK8BTokOoW7iwE https://mega.nz/file/k9Zl0bCA#RFMs-F7rgjc1EG8MWnEL2yjmJoqR6o7ITNzk_hL_T7M

Sorry, but the link you provide in

(@asinghvi17 here's a simple vector cube example in the wild)

is to the profile of Anshul Singhvi , not to an example of a vector cube.

The organization of the spatial structure information in these files is very common in the atmospheric sciences, although some people prefer to interpolate them into regular lat/lon grids, but I prefer to work with the "pristine" data. This spatial organization is not very friendly for the plotting tools out there. Panply manages them very well, but lacks of configuring options. Metview seems to do calculations with them, but cannot plot them properly. It would be great to plot them with GeoMakie (I have not tested this, due to my lack of knowledge about it).

RasterStack(mfds) or Raster(mfds) does not work:

files = ["ERA5_N320_1h_sl_sfc_NH15S_1980-12-01_1980-12-31_msl.grib", "ERA5_N320_1h_sl_sfc_NH15S_1981-01-01_1981-01-31_msl.grib"]
ds = GRIBDataset.(files)
mfds = MFDataset(ds,aggdim = "valid_time")
rmfds = Raster(mfds)
ERROR: MethodError: no method matching sourcetrait(::MFDataset{GRIBDataset{Float64, 2, Missing}, 1, String})

Closest candidates are:
  sourcetrait(::Symbol)
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/sources.jl:100
  sourcetrait(::Type{<:Rasters.Source})
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/sources.jl:99
  sourcetrait(::CommonDataModel.CFVariable)
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/commondatamodel.jl:39
  ...

Stacktrace:
 [1] _raster(ds::MFDataset{GRIBDataset{Float64, 2, Missing}, 1, String})
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/array.jl:334
 [2] Raster(ds::MFDataset{GRIBDataset{Float64, 2, Missing}, 1, String}; kw::@Kwargs{})
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/array.jl:333
 [3] top-level scope
   @ REPL[39]:1

OR

rmfds = RasterStack(mfds)
ERROR: MethodError: no method matching sourcetrait(::MFDataset{GRIBDataset{Float64, 2, Missing}, 1, String})

Closest candidates are:
  sourcetrait(::Symbol)
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/sources.jl:100
  sourcetrait(::Type{<:Rasters.Source})
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/sources.jl:99
  sourcetrait(::CommonDataModel.CFVariable)
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/commondatamodel.jl:39
  ...

Stacktrace:
 [1] _cdmlookup(ds::MFDataset{…}, dimname::String, D::Type, crs::Rasters.NoKW, mappedcrs::Rasters.NoKW)
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/commondatamodel.jl:277
 [2] _cdmdim(ds::MFDataset{GRIBDataset{…}, 1, String}, dimname::String, crs::Rasters.NoKW, mappedcrs::Rasters.NoKW)
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/commondatamodel.jl:226
 [3] _dimdict(ds::MFDataset{GRIBDataset{Float64, 2, Missing}, 1, String}, crs::Rasters.NoKW, mappedcrs::Rasters.NoKW)
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/sources/commondatamodel.jl:173
 [4] RasterStack(ds::MFDataset{…}; filename::String, source::Rasters.NoKW, dims::Rasters.NoKW, refdims::Tuple{}, name::Rasters.NoKW, group::Rasters.NoKW, metadata::Rasters.NoKW, layermetadata::Rasters.NoKW, layerdims::Rasters.NoKW, missingval::Rasters.NoKW, crs::Rasters.NoKW, mappedcrs::Rasters.NoKW, coerce::typeof(convert), checkmem::Bool, scaled::Rasters.NoKW, lazy::Bool, verbose::Bool, raw::Bool, kw::@Kwargs{})
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/stack.jl:476
 [5] RasterStack(ds::MFDataset{GRIBDataset{Float64, 2, Missing}, 1, String})
   @ Rasters ~/.julia/packages/Rasters/mJsGi/src/stack.jl:451
 [6] top-level scope
   @ REPL[38]:1
Some type information was truncated. Use `show(err)` to see complete types.

aasdelat avatar May 03 '25 17:05 aasdelat

Thats actually not the worst error to see, we simply need to get the trait for GRIB from the inner dataset. I'm guessing it cant hold mixed dataset types?

rafaqz avatar May 03 '25 18:05 rafaqz

Looking at the struct definition it seems like there's nothing explicitly against it but we can always error, if they are not the same sourcetrait... can get e.g. only(unique(sourcetrait, inner_datasets)).

@aasdelat do you have an image of what you want to be plotted from this dataset? Plotting as points is simple but there are many ways to plot these kinds of datasets on a map, maybe you want to interpolate?

Also, what exactly do you mean by performing math on these cubes? If it's the exact same cube, with the exact same lat and long values, then it makes sense - but otherwise, what does it mean to perform math on long/lat vars that don't match up?

asinghvi17 avatar May 03 '25 19:05 asinghvi17

also, @rafaqz this is what the attributes of the dataset look like:



Dataset: /Users/anshul/Downloads/ERA5_N320_1h_sl_sfc_NH15S_1980-12-01_1980-12-31_msl.grib
Group: /

Dimensions
   values = 338880
   valid_time = 744

Variables
  valid_time   (744)
    Datatype:    Dates.DateTime (Int64)
    Dimensions:  valid_time
    Attributes:
     units                = seconds since 1970-01-01T00:00:00
     calendar             = proleptic_gregorian
     long_name            = time
     standard_name        = time

  longitude   (338880)
    Datatype:    Float64 (Float64)
    Dimensions:  values
    Attributes:
     units                = degrees_east
     long_name            = longitude
     standard_name        = longitude

  latitude   (338880)
    Datatype:    Float64 (Float64)
    Dimensions:  values
    Attributes:
     units                = degrees_north
     long_name            = latitude
     standard_name        = latitude

  msl   (338880 × 744)
    Datatype:    Union{Missing, Float64} (Float64)
    Dimensions:  values × valid_time
    Attributes:
     units                = Pa
     long_name            = Mean sea level pressure
     standard_name        = air_pressure_at_mean_sea_level

Global attributes
  edition              = 1
  source               = /Users/anshul/Downloads/ERA5_N320_1h_sl_sfc_NH15S_1980-12-01_1980-12-31_msl.grib
  centreDescription    = European Centre for Medium-Range Weather Forecasts
  centre               = ecmf
  subCentre            = 0
  Conventions          = CF-1.7

so there's no way for us to know what the dimensions are there, this doesn't seem to conform to CF standards either as there is no "coordinates" attribute to msl.

asinghvi17 avatar May 03 '25 19:05 asinghvi17

Yeah it can still work without coordinates, we know Lat and Lon are connected to values and are standard Lattitude and longitude, while there is no values variable

rafaqz avatar May 03 '25 19:05 rafaqz

So it's a Point Geometry lookup

(And I guess a 3d xy scatter over time)

rafaqz avatar May 03 '25 19:05 rafaqz

yeah, each slice in time can be interpreted as 3d points (or just 2d)

asinghvi17 avatar May 03 '25 19:05 asinghvi17

whoa, this is pretty slick!


using GRIBDatasets # io
using GLMakie, GeoMakie # visualization
using Delaunator # fast meshing of 2d point clouds

ds = GRIBDataset("path_to_ds.grib") # or replace with your MFDataset

# Create a triangulation of the points with Delaunator.jl
# (this only triangulates on the plane - there are ways to triangulate on the sphere, but all quite complex)
t1 = Delaunator.triangulate(tuple.(ds["longitude"], ds["latitude"]))

# Plot it using Makie.jl's `mesh` plot function,
# in a GeoMakie.GlobeAxis (3d globe).
# You can also use a GeoMakie.GeoAxis (2d projected map) or a Makie.Axis (pure 2d, no geospatial awareness).
fig, ax, plt = mesh(
    t1.points, GLMakie.TriangleFace.(t1.triangles); 
    color = replace(ds["msl"][:, 1], missing => NaN), # Makie cannot handle `missing` so we convert to NaN, Rasters.jl can handle this for you.
    shading = NoShading, # flat lighting on the mesh 
    axis = (; type = GeoMakie.GlobeAxis, show_axis = false),
    figure = (; backgroundcolor = :black)
)

# Plot coastlines, color black with 10% opacity, 20km above sea level
coastlines_plt = lines!(ax, GeoMakie.coastlines(); color = (:black, 0.1), zlevel = 20_000)

# Display the figure so you can see it and interact with it!
# GlobeAxis user controls are not the greatest at the moment but that
# will improve this summer.
fig

# Animate the whole thing
# first, add a title to the top
title = Label(fig[0, 1]; text = string(ds["valid_time"][1]), tellwidth = false, tellheight = true, color = :white)
msl = ds["msl"]
# then, simply loop over the frames and update the color of the mesh plot, and the title!
# Animation in Makie is really simple using Observables.
record(fig, "over_time.mp4", 1:length(ds["valid_time"]), framerate = 60, px_per_unit = 2, update = false) do i
    plt.color[] = replace(msl[:, i], missing => NaN)
    title.text[] = string(ds["valid_time"][i])
end

https://github.com/user-attachments/assets/3f6198d3-7493-4185-9c9d-d476980a638c

I made it look nice, but you can get away without that for much of this.

asinghvi17 avatar May 03 '25 20:05 asinghvi17

With some magic (https://github.com/HolyLab/FlyThroughPaths.jl/pull/14) you can even do this:

https://github.com/user-attachments/assets/435f85e5-d956-4596-9de1-2a9b02af49fc

asinghvi17 avatar May 03 '25 22:05 asinghvi17

Wow, this is awesome!

Also, what exactly do you mean by performing math on these cubes?

Do not worry about computations. Once I have a Raster or YAXArray, Julia makes them very well.

About the graphic, it looks great! I have copied and run the code for the graphic and a window opens with the data plotted on the world, and I can rotate it with the mouse. But the file "over_time.mp4" that I produce, is a black image where only the time stamp appears. I do not think it is an issue on my video player application (VLC), because the time stamp appears on the top of the "image".

aasdelat avatar May 06 '25 11:05 aasdelat

I'm working on improving our CF compatability, this file should be able to load with a MergedLookup (basically points) for the values dimension that you can select directly with X and Y. Its actually enough information without the explicit coordinates - there is no values variable so we automatically look for other variables that are only on the values dimension, and as there are 2 we combine them in MergedLookup. Its looking like we can represent nearly the whole CF spec now, besides the weird interpolation stuff. Its just taken a long time to have all the components written.

Its working on my branch, so maybe in a month it will be on main 😅

rafaqz avatar May 06 '25 12:05 rafaqz