xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Y-axis is reversed when using to_zarr()

Open alistaireverett opened this issue 1 year ago • 2 comments

What happened?

When I export a dataset to NetCDF and Zarr, the y axis appears to have been reversed with gdalinfo. I also cannot build a vrt file with the Zarr file since it complains about positive NS axis, but this works fine with the NetCDF file.

Example NetCDF file as input: in.nc.zip

gdalinfo on output NetCDF file:

$ gdalinfo NETCDF:out.nc:air_temperature_2m
Driver: netCDF/Network Common Data Format
Files: out.nc
       out.nc.aux.xml
Size is 949, 1069
Coordinate System is:
PROJCRS["unnamed",
    BASEGEOGCRS["unknown",
        DATUM["unnamed",
            ELLIPSOID["Sphere",6371000,0,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    CONVERSION["unnamed",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",63.3,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",15,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",63.3,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",63.3,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["northing",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
Data axis to CRS axis mapping: 1,2
Origin = (-1061334.000000000000000,1338732.125000000000000)
Pixel Size = (2500.000000000000000,-2500.000000000000000)
Metadata:
  air_temperature_2m#coordinates=longitude latitude
  air_temperature_2m#grid_mapping=projection_lambert
  air_temperature_2m#long_name=Screen level temperature (T2M)
  air_temperature_2m#standard_name=air_temperature
  air_temperature_2m#units=K
  air_temperature_2m#_FillValue=9.96921e+36
  height1#description=height above ground
  height1#long_name=height
  height1#positive=up
  height1#units=m
  height1#_FillValue=nan
  NC_GLOBAL#coordinates=projection_lambert time
  NETCDF_DIM_EXTRA={height1}
  NETCDF_DIM_height1_DEF={1,5}
  NETCDF_DIM_height1_VALUES=2
  projection_lambert#earth_radius=6371000
  projection_lambert#grid_mapping_name=lambert_conformal_conic
  projection_lambert#latitude_of_projection_origin=63.3
  projection_lambert#longitude_of_central_meridian=15
  projection_lambert#standard_parallel={63.3,63.3}
  x#long_name=x-coordinate in Cartesian system
  x#standard_name=projection_x_coordinate
  x#units=m
  x#_FillValue=nan
  y#long_name=y-coordinate in Cartesian system
  y#standard_name=projection_y_coordinate
  y#units=m
  y#_FillValue=nan
Geolocation:
  LINE_OFFSET=0
  LINE_STEP=1
  PIXEL_OFFSET=0
  PIXEL_STEP=1
  SRS=GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
  X_BAND=1
  X_DATASET=NETCDF:"out.nc":longitude
  Y_BAND=1
  Y_DATASET=NETCDF:"out.nc":latitude
Corner Coordinates:
Upper Left  (-1061334.000, 1338732.125) ( 18d10'24.02"W, 72d45'59.56"N)
Lower Left  (-1061334.000,-1333767.875) (  0d15'55.60"E, 50d18'23.10"N)
Upper Right ( 1311166.000, 1338732.125) ( 54d17'24.85"E, 71d34'43.38"N)
Lower Right ( 1311166.000,-1333767.875) ( 33d 2'20.10"E, 49d45' 6.51"N)
Center      (  124916.000,    2482.125) ( 17d30' 3.21"E, 63d18' 1.50"N)
Band 1 Block=949x1069 Type=Float32, ColorInterp=Undefined
  Min=236.480 Max=284.937 
  Minimum=236.480, Maximum=284.937, Mean=269.816, StdDev=9.033
  NoData Value=9.96920996838686905e+36
  Unit Type: K
  Metadata:
    coordinates=longitude latitude
    grid_mapping=projection_lambert
    long_name=Screen level temperature (T2M)
    NETCDF_DIM_height1=2
    NETCDF_VARNAME=air_temperature_2m
    standard_name=air_temperature
    STATISTICS_MAXIMUM=284.93682861328
    STATISTICS_MEAN=269.81614967971
    STATISTICS_MINIMUM=236.47978210449
    STATISTICS_STDDEV=9.0332172122638
    units=K
    _FillValue=9.96921e+36

gdalinfo on output Zarr file:

$ gdalinfo ZARR:out.zarr:/air_temperature_2m:0
Driver: Zarr/Zarr
Files: none associated
Size is 949, 1069
Origin = (-1061334.000000000000000,-1333767.875000000000000)
Pixel Size = (2500.000000000000000,2500.000000000000000)
Metadata:
  coordinates=longitude latitude
  grid_mapping=projection_lambert
  long_name=Screen level temperature (T2M)
  standard_name=air_temperature
Corner Coordinates:
Upper Left  (-1061334.000,-1333767.875) 
Lower Left  (-1061334.000, 1338732.125) 
Upper Right ( 1311166.000,-1333767.875) 
Lower Right ( 1311166.000, 1338732.125) 
Center      (  124916.000,    2482.125) 
Band 1 Block=475x268 Type=Float32, ColorInterp=Undefined
  NoData Value=9.96920996838686905e+36
  Unit Type: K

The main issue is that the origin and y-axis direction is reversed, as you can see from the origin and pixel size. I have tried taking the CRS from the netcdf and adding it to the Zarr file as a _CRS attribute manually, but this doesn't make any difference to the origin or pixel size.

What did you expect to happen?

Origin, pixel size and corner coords should match those in the netcdf file.

$ gdalinfo ZARR:out.zarr:/air_temperature_2m:0
Driver: Zarr/Zarr
Files: none associated
Size is 949, 1069
Origin = (-1061334.000000000000000,1338732.125000000000000)
Pixel Size = (2500.000000000000000,-2500.000000000000000)
Metadata:
  coordinates=longitude latitude
  grid_mapping=projection_lambert
  long_name=Screen level temperature (T2M)
  standard_name=air_temperature
Corner Coordinates:
Corner Coordinates:
Upper Left  (-1061334.000, 1338732.125) ( 18d10'24.02"W, 72d45'59.56"N)
Lower Left  (-1061334.000,-1333767.875) (  0d15'55.60"E, 50d18'23.10"N)
Upper Right ( 1311166.000, 1338732.125) ( 54d17'24.85"E, 71d34'43.38"N)
Lower Right ( 1311166.000,-1333767.875) ( 33d 2'20.10"E, 49d45' 6.51"N)
Center      (  124916.000,    2482.125) ( 17d30' 3.21"E, 63d18' 1.50"N)
Band 1 Block=475x268 Type=Float32, ColorInterp=Undefined
  NoData Value=9.96920996838686905e+36
  Unit Type: K

Minimal Complete Verifiable Example

import xarray as xr
from pyproj import CRS

ds = xr.open_dataset("in.nc")

# Optionally take copy CRS to Zarr (produces and error, but does work)
crs_wkt = CRS.from_cf(ds["projection_lambert"].attrs).to_wkt()
ds["air_temperature_2m"] = ds["air_temperature_2m"].assign_attrs(_CRS={"wkt": crs_wkt})

ds.to_zarr("out.zarr")

ds.to_netcdf("out.nc")

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 6.5.0-15-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development

xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.4 scipy: None netCDF4: 1.6.5 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.1 conda: None pytest: None mypy: None IPython: None sphinx: None

alistaireverett avatar Feb 13 '24 14:02 alistaireverett

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

welcome[bot] avatar Feb 13 '24 14:02 welcome[bot]

Thanks for documenting this @alistaireverett, it is a bit surprising. Interestingly the default output Zarr is identical in memory to the starting netCDF dataset:

ds = xr.open_dataset("in.nc")
ds.to_zarr("out.zarr")
dz = xr.open_zarr("out.zarr")
xr.testing.assert_identical(ds, dz) #True

Also bypassing Xarray entirely and just using GDAL still has this flipped y-axis, so I'm thinking there is some issue with in.nc rather than Xarray's to_zarr() method.

gdalmdimtranslate -of ZARR in.nc out-gdal.zarr 
gdalinfo ZARR:"out-gdal.zarr":/air_temperature_2m

# Origin = (-1061334.000000000000000,-1333767.875000000000000)
# Pixel Size = (2500.000000000000000,2500.000000000000000)

Can you share a bit more about where in.nc comes from and check on the metadata structure? https://github.com/cedadev/cf-checker could be useful here, for example I see a few warnings, GDAL can have trouble if you have a non-compliant NetCDF (https://github.com/OSGeo/gdal/issues/8477):

CHECKING NetCDF FILE: in.nc
=====================
Using CF Checker Version 4.1.0
Checking against CF Version CF-1.8
Using Standard Name Table Version 84 (2024-01-19T15:55:10Z)
Using Area Type Table Version 11 (06 July 2023)
Using Standardized Region Name Table Version 4 (18 December 2018)

INFO: Attribute coordinates is being used in a non-standard way; as a global attribute. (See Appendix A)
WARN: (2.6.1): No 'Conventions' attribute present

scottyhq avatar Feb 23 '24 04:02 scottyhq

Closing as upstream; please reopen with an MCVE if that's not correct

max-sixty avatar Apr 28 '24 20:04 max-sixty