argopy
argopy copied to clipboard
Error when using gdac loader: "ValueError: 'PROFILE_PSAL_QC' is not present in all datasets"
Hello, I am trying to load argo data by region using a local file, a download of the June 2022 snapshot. For some regions everything loads properly, but for others I get the error ValueError: 'PROFILE_PSAL_QC' is not present in all datasets
which seems strange because I don't recognize that as a data variable returned in any other argopy dataset. Any advice is appreciated!
MCVE Code Sample
import argopy
from argopy import DataFetcher as ArgoDataFetcher
argo_loader=ArgoDataFetcher(src='gdac',ftp="202206-ArgoData",parallel=True)
ds = argo_loader.region([-148,-147,38,40,0,2000]).to_xarray()
Error returned:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1394, in Dataset._construct_dataarray(self, name)
1393 try:
-> 1394 variable = self._variables[name]
1395 except KeyError:
KeyError: 'PROFILE_PSAL_QC'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:514, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
513 try:
--> 514 vars = ensure_common_dims([ds[k].variable for ds in datasets])
515 except KeyError:
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:514, in <listcomp>(.0)
513 try:
--> 514 vars = ensure_common_dims([ds[k].variable for ds in datasets])
515 except KeyError:
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1498, in Dataset.__getitem__(self, key)
1497 if hashable(key):
-> 1498 return self._construct_dataarray(key)
1499 else:
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:1396, in Dataset._construct_dataarray(self, name)
1395 except KeyError:
-> 1396 _, name, variable = _get_virtual_variable(
1397 self._variables, name, self._level_coords, self.dims
1398 )
1400 needed_dims = set(variable.dims)
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/dataset.py:169, in _get_virtual_variable(variables, key, level_vars, dim_sizes)
168 else:
--> 169 ref_var = variables[ref_name]
171 if var_name is None:
KeyError: 'PROFILE_PSAL_QC'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 ds = argo_loader.region([-148,-147,38,40,0,2000]).to_xarray()
File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/fetchers.py:426, in ArgoDataFetcher.to_xarray(self, **kwargs)
421 if not self.fetcher:
422 raise InvalidFetcher(
423 " Initialize an access point (%s) first."
424 % ",".join(self.Fetchers.keys())
425 )
--> 426 xds = self.fetcher.to_xarray(**kwargs)
427 xds = self.postproccessor(xds)
429 # data_path = self.fetcher.cname() + self._mode + ".zarr"
430 # log.debug(data_path)
431 # if self.cache and self.fs.exists(data_path):
(...)
435 # xds = self.postproccessor(xds)
436 # xds = self._write(data_path, xds)._read(data_path)
File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/data_fetchers/gdacftp_data.py:338, in FTPArgoDataFetcher.to_xarray(self, errors)
335 raise DataNotFound("No data found for: %s" % self.indexfs.cname)
337 # Download data:
--> 338 ds = self.fs.open_mfdataset(
339 self.uri,
340 method=self.method,
341 concat_dim="N_POINTS",
342 concat=True,
343 preprocess=self._preprocess_multiprof,
344 progress=self.progress,
345 errors=errors,
346 decode_cf=1,
347 use_cftime=0,
348 mask_and_scale=1,
349 )
351 # Data post-processing:
352 ds["N_POINTS"] = np.arange(
353 0, len(ds["N_POINTS"])
354 ) # Re-index to avoid duplicate values
File ~/.conda/envs/argo/lib/python3.10/site-packages/argopy/stores/filesystems.py:376, in filestore.open_mfdataset(self, urls, concat_dim, max_workers, method, progress, concat, preprocess, errors, *args, **kwargs)
373 if len(results) > 0:
374 if concat:
375 # ds = xr.concat(results, dim=concat_dim, data_vars='all', coords='all', compat='override')
--> 376 ds = xr.concat(results, dim=concat_dim, data_vars='minimal', coords='minimal', compat='override')
377 return ds
378 else:
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:238, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
233 else:
234 raise TypeError(
235 "can only concatenate xarray Dataset and DataArray "
236 f"objects, got {type(first_obj)}"
237 )
--> 238 return f(
239 objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs
240 )
File ~/.conda/envs/argo/lib/python3.10/site-packages/xarray/core/concat.py:516, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
514 vars = ensure_common_dims([ds[k].variable for ds in datasets])
515 except KeyError:
--> 516 raise ValueError(f"{k!r} is not present in all datasets.")
517 combined = concat_vars(vars, dim, positions, combine_attrs=combine_attrs)
518 assert isinstance(combined, Variable)
ValueError: 'PROFILE_PSAL_QC' is not present in all datasets.
Versions
Output of `argopy.show_versions()`
SYSTEM
commit: None python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:38:57) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.centos.plus.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.12.1 libnetcdf: 4.8.1
INSTALLED VERSIONS: MIN
aiohttp : 3.8.1
argopy : 0.1.12
erddapy : 1.2.1
fsspec : 2022.5.0
netCDF4 : 1.5.8
packaging : 21.3
scipy : 1.8.1
sklearn : 1.1.1
toolz : 0.11.2
xarray : 2022.3.0
INSTALLED VERSIONS: EXT.EXTRA
dask : 2022.05.2
distributed : 2022.5.2
gsw : 3.4.0
pyarrow : -
tqdm : -
INSTALLED VERSIONS: EXT.PLOTTERS
IPython : 8.4.0
cartopy : 0.20.2
ipykernel : 6.13.0
ipywidgets : 7.7.0
matplotlib : 3.5.2
seaborn : -
INSTALLED VERSIONS: DEV
bottleneck : -
cfgrib : -
cftime : 1.6.0
conda : -
nc_time_axis: -
numpy : 1.22.4
pandas : 1.4.2
pip : 22.1.2
pytest : -
setuptools : 62.3.2
sphinx : -
zarr : -
Hi @andrewfagerheim
I tried the same fetch using the default ftp source (https://data-argo.ifremer.fr/
) and succeeded !
And since we're using the same xarray/argopy versions I don't see why data would be processed by argopy differently.
Therefore, this error is probably coming from some sort of error in the GDAC snapshot
It would be interesting to see if you get the same error using another snapshot ?
If yes, then it could be something to be sent to GDAC folks, otherwise, it's just the June snapshot, and you would have to use another one.
g
Hi @gmaze thanks for the response! I think @dhruvbalwada and I have localized this issue by using ArgoIndexFetcher()
to find which profile numbers were in the problem area. Based on this, it seems like there are two things going on:
- It seems like the error raised (
ValueError: 'PROFILE_PSAL_QC' is not present in all datasets
) is appending PROFILE_ to PSAL_QC, or in other words, the actual problem is with PSAL_QC - One of the floats in this region, #29029 (29029_06_2022.zip) does not have the data variables PSAL or PSAL_QC at all which is likely causing the error.
This float is missing PSAL and PSAL_QC in both the June 2022 and March 2022 snapshots, but those variables are present when we loaded the float through erddap. We are currently downloading another dataset using rsync
to see if this resolves the issue
Hi @dhruvbalwada @andrewfagerheim
Did you fixed this issue using another dataset ?
Since this is not coming from argopy I think I can close the issue here