ClimateTools.jl icon indicating copy to clipboard operation
ClimateTools.jl copied to clipboard

difficulty reading CF compliant files

Open gaelforget opened this issue 4 years ago • 14 comments

After loading one of my files via Panoply to verify that there was nothing wrong with it (see below) I tried the model = load(gcm_files, "tasmax", poly=poly_reg) example and got ERROR: Manually verify x/lat dimension name.

Taking a look in the code I see that getdim_lat relies on a list of hard coded names. I thought that the more general approach was to rely on long_name + units. Not sure what to suggest -- adding to the hard coding list would be a short term fix just for me...

  lon_c   (720)
    Datatype:    Float64
    Dimensions:  lon_c
    Attributes:
     units                = degrees_east
     long_name            = longitude
Screen Shot 2020-02-28 at 4 03 53 PM

gaelforget avatar Feb 28 '20 21:02 gaelforget

Also, the next file I am planning to present to climatetools is also CF-compliant but not on a regular lat-lon grid (see below). But I am going to wait a bit before I try that.

Screen Shot 2020-02-28 at 4 18 42 PM

gaelforget avatar Feb 28 '20 21:02 gaelforget

Thanks for the input! Indeed, this is certainly not an elegant function. From memory, this was coded for a project that involved regional climate models (your second case).

Not sure if the extraction of lon_c based on long_name is robust though. Seems more robust to go with the detected dimensions. For instance, for a regional climate model, the dimension will not have longitude as their dimension. They will have a longitude grid though, with the long_name being longitude. If I rely on detecting say longitude, we will extract the longitude grid and not the native dimension which could be meters, degrees on a stereographic grid, etc...

Open to suggestions though as hardcoding this is not a robust solution either.

Balinus avatar Feb 28 '20 23:02 Balinus

Open to suggestions though as hardcoding this is not a robust solution either.

Cool. Will take a deeper look and might send PR later if I find a way to improve code

regional climate models (your second case)

Just to clarify, I use sets of these files that collectively add up to global model variables

gaelforget avatar Mar 04 '20 14:03 gaelforget

Just to clarify, I use sets of these files that collectively add up to global model variables

You mean likes "tiles" ?

Balinus avatar Mar 09 '20 20:03 Balinus

Just for reference: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#latitude-coordinate

From what I've seen with other tools, they detect dimensions using the units, which is what the CF Conventions seems to imply as well.

lmilechin avatar Mar 10 '20 18:03 lmilechin

Thanks! I've seen that in RCMs, latitude and longitude grid have also an official standard_name. Hence, this should be possible to discern dimensions and coordinates adequately.

I'm gonna rework this extraction part asap.

Balinus avatar Mar 10 '20 19:03 Balinus

Thanks! I've seen that in RCMs, latitude and longitude grid have also an official standard_name. Hence, this should be possible to discern dimensions and coordinates adequately.

As highlighted by @lmilechin it is the units attribute that should be used to identify coordinates per the CF guidelines -- as opposed to standard_name which is only optional and e.g. does not distinguish between different longitude conventions

I'm gonna rework this extraction part asap.

Great! Thanks

gaelforget avatar Mar 10 '20 19:03 gaelforget

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

Balinus avatar Mar 10 '20 19:03 Balinus

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

How about using the files I mentioned at the top of this thread?

These get generated by running 04_netcdf.ipynb from GlobalOceanNotebooks :

outputs/nctiles-newfiles/interp/ETAN.nc
outputs/nctiles-newfiles/tiled/ETAN/ETAN.*.nc

ps. I just reran the notebook in binder & regenerated these without problem

gaelforget avatar Mar 10 '20 20:03 gaelforget

Just to clarify, I use sets of these files that collectively add up to global model variables

You mean likes "tiles" ?

Yes -- one tile = 1 file in this example

gaelforget avatar Mar 10 '20 20:03 gaelforget

To effectively tackle this issue, having access to some problematic datasets would be welcomed.

How about using the files I mentioned at the top of this thread?

These get generated by running 04_netcdf.ipynb from GlobalOceanNotebooks :

outputs/nctiles-newfiles/interp/ETAN.nc
outputs/nctiles-newfiles/tiled/ETAN/ETAN.*.nc

ps. I just reran the notebook in binder & regenerated these without problem

Thanks, I was able to produce the files at home.

Balinus avatar Mar 11 '20 01:03 Balinus

Also, re-read the thread and wanted to clarify: when I spoke about "dimension" I was mostly referring to the dimensions of the datasets, not the units/measure of the variable itself. Hence, the need to distinguish between a rotated latitude "dimension" versus the latitude grid (a variable in the dataset, not the one of the dimension) of a datasets for projected grids.

Anyway, I'll be forced to think about a more general solution to this!

edit - For example, for this dataset, there is rlat and rlon.

Dimensions
   rlat = 412
   rlon = 424
   time = 2920
   bnds = 2

Variables
  lat   (424 × 412)
    Datatype:    Float64
    Dimensions:  rlon × rlat
    Attributes:
     standard_name        = latitude
     long_name            = latitude
     units                = degrees_north

  lon   (424 × 412)
    Datatype:    Float64
    Dimensions:  rlon × rlat
    Attributes:
     standard_name        = longitude
     long_name            = longitude
     units                = degrees_east

  pr   (424 × 412 × 2920)
    Datatype:    Float32
    Dimensions:  rlon × rlat × time
    Attributes:
     grid_mapping         = rotated_pole
     _FillValue           = 1.0e20
     missing_value        = 1.0e20
     standard_name        = precipitation_flux
     long_name            = Precipitation
     units                = kg m-2 s-1
     coordinates          = lon lat
     cell_methods         = time: mean

  rlat   (412)
    Datatype:    Float64
    Dimensions:  rlat
    Attributes:
     standard_name        = grid_latitude
     long_name            = latitude in rotated pole grid
     units                = degrees
     axis                 = Y

  rlon   (424)
    Datatype:    Float64
    Dimensions:  rlon
    Attributes:
     standard_name        = grid_longitude
     long_name            = longitude in rotated pole grid
     units                = degrees
     axis                 = X

Balinus avatar Mar 11 '20 01:03 Balinus

I've sketched some code in #137

It's pretty rough right now but so far it works. Just not sure about the robustness though. Haven't had the time to test your files @gaelforget but I'm pretty sure it does not work. I'm currently testing for axis (optional attribute in CF files) and standard_name attributes of the dimensions. Will add long_name later.

Balinus avatar Mar 13 '20 20:03 Balinus

@gaelforget In the files produced by the Notebook, both lat_c and lon_c has a longitude attribute as their long_name.

Balinus avatar Mar 14 '20 17:03 Balinus