argopy icon indicating copy to clipboard operation
argopy copied to clipboard

Xarray backend to open Argo Netcdf files

Open gmaze opened this issue 2 years ago • 1 comments

Close #176

When openning an Argo netcdf file with xarray, most of the variables are not decoded properly and returned as objects. For a regular core-Argo profile file, 48 out of 64 data variables are not decoded as they should.

This PR tries to implement a new Xarray backend to open Argo Netcdf files where all the variables will be "casted" correctly.

We should be able to open any of the reference Argo netcdf files:

  • [ ] Core-Argo individual profile files (<R/D><FloatWmoID>_<XXX><D>.nc). The core-Argo profile files contain the core parameters provided by a float: pressure, temperature, salinity, conductivity (PRES, TEMP, PSAL, CNDC).
  • [ ] B-Argo individual profile file (B<R/D><FloatWmoID>_<XXX><D>.nc). A B-Argo profile file contains all the parameters from a float, except the core-Argo parameters temperature, salinity, conductivity (TEMP, PSAL, CNDC). A float that performs only CTD measurements does not have B-Argo data files.
  • [ ] BGC-Argo individual synthetic profile file (M<R/D><FloatWmoID>_<XXX><D>.nc). The synthetic file contains the core-Argo and BGC-Argo parameters listed on reference table 3. The intermediate parameters are ignored by the synthetic files.
  • [ ] Argo trajectory data file (<FloatWmoID>_<R/D>traj.nc). The Argo trajectory files contain the core and BGC parameters provided by a float.
  • [ ] Metadata file (<FloatWmoID>_meta.nc).
  • [ ] Technical Data file (<FloatWmoID>_tech.nc).

And obviously:

  • [ ] Core-Argo multiple profile files

gmaze avatar Apr 11 '22 12:04 gmaze

This pull request was marked as staled automatically because it has not seen any activity in 90 days

github-actions[bot] avatar Jul 11 '22 10:07 github-actions[bot]

This pull request was marked as staled automatically because it has not seen any activity in 90 days

github-actions[bot] avatar Dec 22 '22 10:12 github-actions[bot]

I tested the 'argo' engine on a collection of sample files covering all of 7 Argo data types, and it works !

OK for              Argo profile (test file: /dac/bodc/6901929/6901929_prof.nc)
		> 46/64 variables not casted otherwise !
OK for              Argo profile (test file: /dac/meds/4901079/profiles/D4901079_110.nc)
		> 46/64 variables not casted otherwise !
OK for              Argo profile (test file: /dac/aoml/13857/profiles/R13857_001.nc)
		> 43/58 variables not casted otherwise !
OK for           Argo trajectory (test file: /dac/aoml/5900446/5900446_Dtraj.nc)
		> 59/102 variables not casted otherwise !
OK for           Argo trajectory (test file: /dac/csio/2902696/2902696_Rtraj.nc)
		> 58/102 variables not casted otherwise !
OK for    Argo synthetic profile (test file: /dac/coriolis/3902131/3902131_Sprof.nc)
		> 37/58 variables not casted otherwise !
OK for    Argo synthetic profile (test file: /dac/coriolis/3902131/profiles/SD3902131_001.nc)
		> 37/58 variables not casted otherwise !
OK for    Argo synthetic profile (test file: /dac/coriolis/3902131/profiles/SD3902131_001D.nc)
		> 37/58 variables not casted otherwise !
OK for    Argo synthetic profile (test file: /dac/coriolis/6903247/profiles/SR6903247_134.nc)
		> 61/114 variables not casted otherwise !
OK for    Argo synthetic profile (test file: /dac/coriolis/6903247/profiles/SR6903247_134D.nc)
		> 37/58 variables not casted otherwise !
OK for            B-Argo profile (test file: /dac/coriolis/3902131/profiles/BR3902131_001.nc)
		> 47/63 variables not casted otherwise !
OK for            B-Argo profile (test file: /dac/coriolis/3902131/profiles/BR3902131_001D.nc)
		> 47/63 variables not casted otherwise !
OK for         B-Argo trajectory (test file: /dac/coriolis/3902131/3902131_BRtraj.nc)
		> 40/63 variables not casted otherwise !
OK for         B-Argo trajectory (test file: /dac/coriolis/6903247/6903247_BRtraj.nc)
		> 70/131 variables not casted otherwise !
OK for       Argo technical data (test file: /dac/incois/2902269/2902269_tech.nc)
		> 9/10 variables not casted otherwise !
OK for       Argo technical data (test file: /dac/nmdis/2901623/2901623_tech.nc)
		> 9/10 variables not casted otherwise !
OK for            Argo meta-data (test file: /dac/jma/4902252/4902252_meta.nc)
		> 60/65 variables not casted otherwise !
OK for            Argo meta-data (test file: /dac/coriolis/1900857/1900857_meta.nc)
		> 60/65 variables not casted otherwise !

The API is quite simple !

ds = xr.open_dataset(file, engine='argo')

I checked if 100% of the variables are casted (as string, int, float or datetime) and none is returned as an "object".

gmaze avatar Apr 27 '23 13:04 gmaze

"requests" module is missing from requirements when installing from scratch.

quai20 avatar May 23 '23 12:05 quai20

Warning when casting datetime :

argopy/xarray.py:150: UserWarning: Converting non-nanosecond precision datetime values to nanosecond precision. This behavior can eventually be relaxed in xarray, as it is an artifact from pandas which is now beginning to support non-nanosecond precision values. This warning is caused by passing non-nanosecond np.datetime64 or np.timedelta64 values to the DataArray or Variable constructor; it can be silenced by converting the values to nanosecond precision ahead of time.
  da = da.astype(type)

quai20 avatar May 23 '23 12:05 quai20

Warning when casting datetime

Could you please give me the output of:

argopy.show_versions()

gmaze avatar May 23 '23 15:05 gmaze

Warning when casting datetime

Could you please give me the output of:

argopy.show_versions()
SYSTEM
------
commit: None
python: 3.9.16 (main, Mar  8 2023, 14:00:05) 
[GCC 11.2.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-211-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
libhdf5: 1.12.2
libnetcdf: 4.9.1

INSTALLED VERSIONS: CORE
------------------------
aiohttp     : 3.8.4       
argopy      : 0.1.13      
erddapy     : 2.0.1       
fsspec      : 2023.5.0    
netCDF4     : 1.6.3       
packaging   : 23.1        
scipy       : 1.10.1      
toolz       : 0.12.0      
xarray      : 2023.5.0    

INSTALLED VERSIONS: EXT.UTIL
----------------------------
gsw         : -           
tqdm        : -           
zarr        : -           

INSTALLED VERSIONS: EXT.PERF
----------------------------
dask        : -           
distributed : -           
pyarrow     : -           

INSTALLED VERSIONS: EXT.PLOT
----------------------------
IPython     : 8.13.2      
cartopy     : -           
ipykernel   : 6.23.1      
ipywidgets  : -           
matplotlib  : -           
seaborn     : -           

INSTALLED VERSIONS: DEV
-----------------------
black       : -           
bottleneck  : -           
cfgrib      : -           
cftime      : 1.6.2       
conda       : -           
flake8      : -           
nc_time_axis: -           
numpy       : 1.24.3      
pandas      : 2.0.1       
pip         : 23.0.1      
pytest      : -           
pytest_cov  : -           
pytest_env  : -           
pytest_localftpserver: -           
setuptools  : 66.0.0      
sphinx      : -           

quai20 avatar May 23 '23 15:05 quai20

ok, the warning comes from the latest Pandas 2.0 that raises a lot of new warnings I'll fix this in another PR

gmaze avatar May 23 '23 15:05 gmaze