argopy icon indicating copy to clipboard operation
argopy copied to clipboard

Error when using gdac loader: "ValueError: 'PROFILE_PSAL_QC' is not present in all datasets"

Open andrewfagerheim opened this issue 2 years ago • 2 comments

Hello, I am trying to load argo data by region using a local file, a download of the June 2022 snapshot. For some regions everything loads properly, but for others I get the error ValueError: 'PROFILE_PSAL_QC' is not present in all datasets which seems strange because I don't recognize that as a data variable returned in any other argopy dataset. Any advice is appreciated!

MCVE Code Sample

import argopy
from argopy import DataFetcher as ArgoDataFetcher
argo_loader=ArgoDataFetcher(src='gdac',ftp="202206-ArgoData",parallel=True)
ds = argo_loader.region([-148,-147,38,40,0,2000]).to_xarray()

Error returned:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1394, in Dataset._construct_dataarray(self, name)
   1393 try:
-> 1394     variable = self._variables[name]
   1395 except KeyError:

KeyError: 'PROFILE_PSAL_QC'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:514, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    513 try:
--> 514     vars = ensure_common_dims([ds[k].variable for ds in datasets])
    515 except KeyError:

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:514, in <listcomp>(.0)
    513 try:
--> 514     vars = ensure_common_dims([ds[k].variable for ds in datasets])
    515 except KeyError:

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1498, in Dataset.__getitem__(self, key)
   1497 if hashable(key):
-> 1498     return self._construct_dataarray(key)
   1499 else:

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1396, in Dataset._construct_dataarray(self, name)
   1395 except KeyError:
-> 1396     _, name, variable = _get_virtual_variable(
   1397         self._variables, name, self._level_coords, self.dims
   1398     )
   1400 needed_dims = set(variable.dims)

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:169, in _get_virtual_variable(variables, key, level_vars, dim_sizes)
    168 else:
--> 169     ref_var = variables[ref_name]
    171 if var_name is None:

KeyError: 'PROFILE_PSAL_QC'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 ds = argo_loader.region([-148,-147,38,40,0,2000]).to_xarray()

File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/fetchers.py:426, in ArgoDataFetcher.to_xarray(self, **kwargs)
    421 if not self.fetcher:
    422     raise InvalidFetcher(
    423         " Initialize an access point (%s) first."
    424         % ",".join(self.Fetchers.keys())
    425     )
--> 426 xds = self.fetcher.to_xarray(**kwargs)
    427 xds = self.postproccessor(xds)
    429 # data_path = self.fetcher.cname() + self._mode + ".zarr"
    430 # log.debug(data_path)
    431 # if self.cache and self.fs.exists(data_path):
   (...)
    435 #     xds = self.postproccessor(xds)
    436 #     xds = self._write(data_path, xds)._read(data_path)

File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/data_fetchers/gdacftp_data.py:338, in FTPArgoDataFetcher.to_xarray(self, errors)
    335     raise DataNotFound("No data found for: %s" % self.indexfs.cname)
    337 # Download data:
--> 338 ds = self.fs.open_mfdataset(
    339     self.uri,
    340     method=self.method,
    341     concat_dim="N_POINTS",
    342     concat=True,
    343     preprocess=self._preprocess_multiprof,
    344     progress=self.progress,
    345     errors=errors,
    346     decode_cf=1,
    347     use_cftime=0,
    348     mask_and_scale=1,
    349 )
    351 # Data post-processing:
    352 ds["N_POINTS"] = np.arange(
    353     0, len(ds["N_POINTS"])
    354 )  # Re-index to avoid duplicate values

File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/stores/filesystems.py:376, in filestore.open_mfdataset(self, urls, concat_dim, max_workers, method, progress, concat, preprocess, errors, *args, **kwargs)
    373 if len(results) > 0:
    374     if concat:
    375         # ds = xr.concat(results, dim=concat_dim, data_vars='all', coords='all', compat='override')
--> 376         ds = xr.concat(results, dim=concat_dim, data_vars='minimal', coords='minimal', compat='override')
    377         return ds
    378     else:

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:238, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    233 else:
    234     raise TypeError(
    235         "can only concatenate xarray Dataset and DataArray "
    236         f"objects, got {type(first_obj)}"
    237     )
--> 238 return f(
    239     objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs
    240 )

File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:516, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    514     vars = ensure_common_dims([ds[k].variable for ds in datasets])
    515 except KeyError:
--> 516     raise ValueError(f"{k!r} is not present in all datasets.")
    517 combined = concat_vars(vars, dim, positions, combine_attrs=combine_attrs)
    518 assert isinstance(combined, Variable)

ValueError: 'PROFILE_PSAL_QC' is not present in all datasets.

Versions

Output of `argopy.show_versions()`

SYSTEM

commit: None python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.centos.plus.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.12.1 libnetcdf: 4.8.1

INSTALLED VERSIONS: MIN

aiohttp : 3.8.1
argopy : 0.1.12
erddapy : 1.2.1
fsspec : 2022.5.0
netCDF4 : 1.5.8
packaging : 21.3
scipy : 1.8.1
sklearn : 1.1.1
toolz : 0.11.2
xarray : 2022.3.0

INSTALLED VERSIONS: EXT.EXTRA

dask : 2022.05.2
distributed : 2022.5.2
gsw : 3.4.0
pyarrow : -
tqdm : -

INSTALLED VERSIONS: EXT.PLOTTERS

IPython : 8.4.0
cartopy : 0.20.2
ipykernel : 6.13.0
ipywidgets : 7.7.0
matplotlib : 3.5.2
seaborn : -

INSTALLED VERSIONS: DEV

bottleneck : -
cfgrib : -
cftime : 1.6.0
conda : -
nc_time_axis: -
numpy : 1.22.4
pandas : 1.4.2
pip : 22.1.2
pytest : -
setuptools : 62.3.2
sphinx : -
zarr : -

andrewfagerheim avatar Jul 20 '22 02:07 andrewfagerheim

Hi @andrewfagerheim I tried the same fetch using the default ftp source (https://data-argo.ifremer.fr/) and succeeded ! And since we're using the same xarray/argopy versions I don't see why data would be processed by argopy differently. Therefore, this error is probably coming from some sort of error in the GDAC snapshot It would be interesting to see if you get the same error using another snapshot ? If yes, then it could be something to be sent to GDAC folks, otherwise, it's just the June snapshot, and you would have to use another one. g

gmaze avatar Jul 20 '22 07:07 gmaze

Hi @gmaze thanks for the response! I think @dhruvbalwada and I have localized this issue by using ArgoIndexFetcher() to find which profile numbers were in the problem area. Based on this, it seems like there are two things going on:

  1. It seems like the error raised (ValueError: 'PROFILE_PSAL_QC' is not present in all datasets) is appending PROFILE_ to PSAL_QC, or in other words, the actual problem is with PSAL_QC
  2. One of the floats in this region, #29029 (29029_06_2022.zip) does not have the data variables PSAL or PSAL_QC at all which is likely causing the error.

This float is missing PSAL and PSAL_QC in both the June 2022 and March 2022 snapshots, but those variables are present when we loaded the float through erddap. We are currently downloading another dataset using rsync to see if this resolves the issue

andrewfagerheim avatar Jul 20 '22 15:07 andrewfagerheim

Hi @dhruvbalwada @andrewfagerheim

Did you fixed this issue using another dataset ?

Since this is not coming from argopy I think I can close the issue here

gmaze avatar Sep 23 '22 09:09 gmaze