Rasters.jl icon indicating copy to clipboard operation
Rasters.jl copied to clipboard

cf: if `crs` key not present, check keys like `proj4`, `wkt`, `epsg`, `esri`

Open asinghvi17 opened this issue 1 year ago • 6 comments

This is on the cf branch.

I tried to load a raster where grid_mapping => "crs", but there's no crs field in the dataset, only a proj4 field. It also looks like the current code tries to load a variable in the dataset? somehow? Maybe that's a typo.

Ended up hacking around it by simply replacing https://github.com/rafaqz/Rasters.jl/blob/c2258400942ae680e4706153f4189d1bc9ff419b/src/sources/commondatamodel.jl#L165-L173 with a set of checks on possible crs strings, and it doesn't error if it can't find it, only warns.

function _layermetadata(ds::AbstractDataset; layers)
    map(layers.attrs) do attr
        md = _metadatadict(_sourcetrait(ds), attr)
        if haskey(attr, "grid_mapping")
            gm_str = attr["grid_mapping"]
            md["grid_mapping"] = if haskey(ds, gm_str)
                Dict(CDM.attribs(ds[gm_str]))
            elseif haskey(CDM.attribs(ds), gm_str)
                Dict(gm_str => CDM.attribs(ds)[gm_str])
            elseif gm_str == "crs"
                if haskey(ds, "crs")
                    Dict("crs" => CDM.attribs(ds)["crs"])
                elseif haskey(ds, "proj4")
                    Dict("crs" => CDM.attribs(ds)["proj4"])
                end
            else
                @warn "Could not find crs/."
            end
        end
        md
    end
end

Is this a solution that makes sense, or should we just not do it? It doesn't actually read the CRS still.

asinghvi17 avatar Sep 11 '24 23:09 asinghvi17

What's in the spec? I thought the grid_mapping variable was a key that linked to a shared crs string for the whole dataset.

rafaqz avatar Sep 12 '24 05:09 rafaqz

Edited after reading the spec

What happens if that crs variable doesn't exist?

The spec does say it's supposed to be a variable: https://cfconventions.org/cf-conventions/cf-conventions.html#grid-mappings-and-projections

but this particular dataset does not have that variable (though it does have a global proj4 field). It shouldn't error on load at least.

asinghvi17 avatar Sep 12 '24 05:09 asinghvi17

Here's an overview of the dataset, for reference:

julia> zg = zopen("reference://nwm_reanalysis.json", "r")
ZarrGroup at ReferenceStore with 2939545 references
 and path 
Variables: time elevation qBtmVertRunoff q_lateral qSfcLatRunoff streamflow feature_id velocity latitude qBucket order longitude 

julia> zg["velocity"].attrs
Dict{String, Any} with 10 entries:
  "missing_value"     => -999900
  "coordinates"       => "longitude latitude"
  "units"             => "m s-1"
  "add_offset"        => 0.0
  "_ARRAY_DIMENSIONS" => Any["time", "feature_id"]
  "long_name"         => "River Velocity"
  "grid_mapping"      => "crs"
  "scale_factor"      => 0.01
  "_Netcdf4Dimid"     => 0
  "valid_range"       => Any[0, 5000000]

julia> zg.attrs
Dict{String, Any} with 19 entries:
  "TITLE"                     => "OUTPUT FROM WRF-Hydro v5.2.0-beta2"
  "code_version"              => "v5.2.0-beta2"
  "stream_order_output"       => 1
  "proj4"                     => "+proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +…
  "cdm_datatype"              => "Station"
  "model_initialization_time" => "1979-02-01_00:00:00"
  "model_output_valid_time"   => "1979-02-01_01:00:00"
  "dev_channelBucket_only"    => 0
  "model_configuration"       => "retrospective"
  "featureType"               => "timeSeries"
  "dev_channel_only"          => 0
  "Conventions"               => "CF-1.6"
  ⋮                           => ⋮

julia> zg.attrs |> println
Dict{String, Any}("TITLE" => "OUTPUT FROM WRF-Hydro v5.2.0-beta2", "code_version" => "v5.2.0-beta2", "stream_order_output" => 1, "proj4" => "+proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@", "cdm_datatype" => "Station", "model_initialization_time" => "1979-02-01_00:00:00", "model_output_valid_time" => "1979-02-01_01:00:00", "dev_channelBucket_only" => 0, "model_configuration" => "retrospective", "featureType" => "timeSeries", "dev_channel_only" => 0, "Conventions" => "CF-1.6", "model_total_valid_times" => 1416, "station_dimension" => "feature_id", "dev_NOAH_TIMESTEP" => 3600, "dev_OVRTSWCRT" => 1, "_NCProperties" => "version=2,netcdf=4.7.4,hdf5=1.10.7,", "model_output_type" => "channel_rt", "dev" => "dev_ prefix indicates development/internal meta data")

and here's an overview from ZarrDatasets:


julia> zd = ZarrDataset(zg)
Dataset: 
Group: root

Dimensions
   time = 367439
   feature_id = 2776738

Variables
  time   (367439)
    Datatype:    Dates.DateTime (Int32)
    Dimensions:  time
    Attributes:
     units                = minutes since 1970-01-01 00:00:00 UTC
     NAME                 = time
     long_name            = valid output time
     _Netcdf4Dimid        = 1
     standard_name        = time
     valid_min            = 4777980
     valid_max            = 4862880

  elevation   (2776738 × 367439)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  feature_id × time
    Attributes:
     units                = meters
     coordinates          = longitude latitude
     long_name            = Feature Elevation
     _Netcdf4Dimid        = 0
     standard_name        = Elevation
     _FillValue           = 0.0

  qBtmVertRunoff   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -9999000
     coordinates          = longitude latitude
     units                = m3
     add_offset           = 0.0
     long_name            = Runoff from bottom of soil to bucket
     grid_mapping         = crs
     scale_factor         = 0.0010000000474974513
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 20000000]

  q_lateral   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -99990
     coordinates          = longitude latitude
     units                = m3 s-1
     add_offset           = 0.0
     long_name            = Runoff into channel reach
     grid_mapping         = crs
     scale_factor         = 0.10000000149011612
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 500000]

  qSfcLatRunoff   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -999900000
     coordinates          = longitude latitude
     units                = m3 s-1
     add_offset           = 0.0
     long_name            = Runoff from terrain routing
     grid_mapping         = crs
     scale_factor         = 9.999999747378752e-6
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 2000000000]

  streamflow   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -999900
     coordinates          = longitude latitude
     units                = m3 s-1
     add_offset           = 0.0
     long_name            = River Flow
     grid_mapping         = crs
     scale_factor         = 0.009999999776482582
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 5000000]

  feature_id   (2776738)
    Datatype:    Int32 (Int32)
    Dimensions:  feature_id
    Attributes:
     cf_role              = timeseries_id
     long_name            = Reach ID
     _Netcdf4Dimid        = 0
     comment              = NHDPlusv2 ComIDs within CONUS, arbitrary Reach IDs outside of CONUS
     NAME                 = feature_id

  velocity   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -999900
     coordinates          = longitude latitude
     units                = m s-1
     add_offset           = 0.0
     long_name            = River Velocity
     grid_mapping         = crs
     scale_factor         = 0.009999999776482582
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 5000000]

  latitude   (2776738)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  feature_id
    Attributes:
     units                = degrees_north
     long_name            = Feature latitude
     _Netcdf4Dimid        = 0
     standard_name        = latitude
     _FillValue           = 0.0

  qBucket   (2776738 × 367439)
    Datatype:    Union{Missing, Float64} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     missing_value        = -999900000
     coordinates          = longitude latitude
     units                = m3 s-1
     add_offset           = 0.0
     long_name            = Flux from gw bucket
     grid_mapping         = crs
     scale_factor         = 9.999999747378752e-6
     _Netcdf4Dimid        = 0
     _FillValue           = 0
     valid_range          = Any[0, 2000000000]

  order   (2776738 × 367439)
    Datatype:    Union{Missing, Int32} (Int32)
    Dimensions:  feature_id × time
    Attributes:
     coordinates          = longitude latitude
     long_name            = Streamflow Order
     _Netcdf4Dimid        = 0
     standard_name        = order
     _FillValue           = 0

  longitude   (2776738)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  feature_id
    Attributes:
     units                = degrees_east
     long_name            = Feature longitude
     _Netcdf4Dimid        = 0
     standard_name        = longitude
     _FillValue           = 0.0

Global attributes
  TITLE                = OUTPUT FROM WRF-Hydro v5.2.0-beta2
  code_version         = v5.2.0-beta2
  stream_order_output  = 1
  proj4                = +proj=lcc +units=m +a=6370000.0 +b=6370000.0 +lat_1=30.0 +lat_2=60.0 +lat_0=40.0 +lon_0=-97.0 +x_0=0 +y_0=0 +k_0=1.0 +nadgrids=@
  cdm_datatype         = Station
  model_initialization_time = 1979-02-01_00:00:00
  model_output_valid_time = 1979-02-01_01:00:00
  dev_channelBucket_only = 0
  model_configuration  = retrospective
  featureType          = timeSeries
  dev_channel_only     = 0
  Conventions          = CF-1.6
  model_total_valid_times = 1416
  station_dimension    = feature_id
  dev_NOAH_TIMESTEP    = 3600
  dev_OVRTSWCRT        = 1
  _NCProperties        = version=2,netcdf=4.7.4,hdf5=1.10.7,
  model_output_type    = channel_rt
  dev                  = dev_ prefix indicates development/internal meta data

asinghvi17 avatar Sep 12 '24 05:09 asinghvi17

Hm I thought grid_mapping = crs meant there was a field called "crs" that held the crs.

Maybe there's a convention to name them e.g. proj4 etc to identify the type? I'm not sure this stuff is fully official. We need somewhere in writing what this should be.

rafaqz avatar Sep 12 '24 06:09 rafaqz

So I just found out that the data structure I encountered is very much not official.

That being said it still works in Python so we should probably look at that.

asinghvi17 avatar Oct 09 '24 23:10 asinghvi17

Yeah, there are lots of hacks to get around the netcdf CF.

I'm happy to check for those keys, but would be good to see the python code that's generating them to get it right

rafaqz avatar Oct 10 '24 06:10 rafaqz