argopy
argopy copied to clipboard
Cannot access localftp data
This code accessing profile data locally fails with "Data not found" error even though the data is available at the DAC-compliant path. The works fine with src='erddap' or src='argovis'
argo_loader = ArgoDataFetcher(mode='expert', src='localftp', local_ftp='/Users/ericrehm/ftp/outgoing/argo')
wmoId = 6903550
profileNumber = 34
ds = argo_loader.profile(wmoId, profileNumber).to_xarray().to_dataframe()
ds.head()
Error Traceback
DataNotFound Traceback (most recent call last)
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs) 270 raise InvalidFetcher(" Initialize an access point (%s) first." % 271 ",".join(self.Fetchers.keys())) --> 272 xds = self.fetcher.to_xarray(**kwargs) 273 xds = self.postproccessor(xds) 274 return xds
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/data_fetchers/localftp_data.py in to_xarray(self, errors) 326 else: 327 method = self.parallel_method --> 328 ds = self.fs.open_mfdataset(self.uri, 329 method=method, 330 concat_dim='N_POINTS',
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/stores/filesystems.py in open_mfdataset(self, urls, concat_dim, max_workers, method, progress, concat, preprocess, errors, *args, **kwargs) 307 return results 308 else: --> 309 raise DataNotFound(urls) 310 311 def read_csv(self, url, **kwargs):
DataNotFound: ['/Users/ericrehm/ftp/outgoing/argo/dac/coriolis/6903550/profiles/BD6903550_034.nc']
However the data does exist and is accessbile
ERICs-MBP-2:Downloads ericrehm$ ncdump -h /Users/ericrehm/ftp/outgoing/argo/dac/coriolis/6903550/profiles/BD6903550_034.nc | head netcdf BD6903550_034 { dimensions: DATE_TIME = 14 ; STRING256 = 256 ; STRING64 = 64 ; STRING32 = 32 ; STRING16 = 16 ; STRING8 = 8 ; STRING4 = 4 ; STRING2 = 2 ;
Show_versions output
argopy: 0.1.7 src: /Users/ericrehm/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/init.py options: {'src': 'erddap', 'local_ftp': '.', 'dataset': 'phy', 'cachedir': '/Users/ericrehm/.cache/argopy', 'mode': 'standard', 'api_timeout': 60}
INSTALLED VERSIONS
commit: None python: 3.8.5 (default, Sep 4 2020, 02:22:02) [Clang 10.0.0 ] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3
argopy: 0.1.7 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.2 scipy: 1.5.2 fsspec: 0.8.3 erddapy: 0.9.0 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.02.0 distributed: None matplotlib: 3.1.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None gsw: 3.4.0 setuptools: 52.0.0.post20210125 pip: 20.3.3 conda: None pytest: None IPython: 7.20.0 sphinx: Nonex`
Hi @SBS-EREHM ,
My guess is xarray version. We have some difficulties with latest versions of xarray. On my side, this works with argopy 0.1.7 & xarray 0.16.1
from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher(mode='expert', src='localftp', local_ftp='/export/home/kbalem/ftp-argo/')
wmoId = 6903550
profileNumber = 34
ds = argo_loader.profile(wmoId, profileNumber).to_xarray().to_dataframe()
Hi Kevin,
Thanks. Reverting back to xarray=0.16.1 got me past this.
However “expert” mode seems to happily return the BGC data set (B-file) no matter what the setting of the dataset parameter.
When go back to mode=’standard’, the dataset parameter is again ignored, and the call try to return the Core (phy) dataset, but failing with KeyError: ‘PRES_ADJUSTED’.
I was hoping to play with the both the Core and BGC data, understanding there are limitations in argosy in the BGC area….
argo_loader = ArgoDataFetcher(dataset='phy', mode='standard', src='localftp', local_ftp='/Users/ericrehm/ftp/outgoing/argo')
wmoId = 6903550 profileNumber = 34 ds = argo_loader.profile(wmoId, profileNumber).to_xarray().to_dataframe() ds.head()
Traceback:
KeyError Traceback (most recent call last)
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/xarray/core/dataset.py in _construct_dataarray(self, name) 1171 try: -> 1172 variable = self._variables[name] 1173 except KeyError:
KeyError : 'PRES_ADJUSTED'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs) 271 ",".join(self.Fetchers.keys()))
272 xds = self.fetcher.to_xarray(**kwargs)
--> 273 xds = self.postproccessor(xds) 274 return xds 275
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/fetchers.py in postprocessing(xds) 209 210 def postprocessing(xds): --> 211 xds = self.fetcher.filter_data_mode(xds) 212 xds = self.fetcher.filter_qc(xds) 213 xds = self.fetcher.filter_variables(xds, self._mode)
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/data_fetchers/localftp_data.py in filter_data_mode(self, ds, **kwargs) 359 360 def filter_data_mode(self, ds, **kwargs): --> 361 ds = ds.argo.filter_data_mode(errors='ignore', **kwargs) 362 if ds.argo._type == 'point': 363 ds['N_POINTS'] = np.arange(0, len(ds['N_POINTS']))
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/xarray.py in filter_data_mode(self, keep_error, errors) 336 # Fill in the adjusted field with the non-adjusted wherever it is NaN 337 for v in plist: --> 338 argo_d = fill_adjusted_nan(argo_d, v.upper()) 339 340 # Drop QC fields in delayed mode dataset:
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/xarray.py in fill_adjusted_nan(ds, vname) 273 """
274
ii = ds.where(
--> 275 np.isnan(ds[vname + '_ADJUSTED']), drop=1)['N_POINTS']
276
ds[vname + '_ADJUSTED'].loc[dict(N_POINTS=ii)
277
] = ds[vname].loc[dict(N_POINTS=ii)]
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/xarray/core/dataset.py in getitem(self, key) 1270 1271 if hashable(key): -> 1272 return self._construct_dataarray(key) 1273 else: 1274 return self._copy_listed(np.asarray(key))
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/xarray/core/dataset.py in _construct_dataarray(self, name) 1172 variable = self._variables[name] 1173 except KeyError: -> 1174 _, name, variable = _get_virtual_variable(
1175 self._variables, name, self._level_coords, self.dims 1176 )
~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes) 169 ref_var = dim_var.to_index_variable().get_level_variable(ref_name) 170 else: --> 171 ref_var = variables[ref_name] 172 173 if var_name is None:
KeyError: 'PRES_ADJUSTED'
- Eric
———
eric rehm ph.d. | Senior Oceanographer Sea-Bird Scientific | www.seabird.comhttp://www.sea-birdscientific.com/ | [email protected]mailto:[email protected] m +1.541.666.9569tel:+1-541-666-9569
On Mar 1, 2021, at 1:52 AM, Kevin Balem <[email protected]mailto:[email protected]> wrote:
Hi @SBS-EREHMhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_SBS-2DEREHM&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=3uTP4cBZd1gJUg53NhXK_P6RLRWG0dHNfcoQ29KG1f4&s=3US6xGbfntEp9FjAaDVxFfJRx9eeeswy7hWH-38Rx4w&e= , My guess is xarray version. There are some difficulties with latest versions of xarray. On my side, this works with argopy 0.1.7 & xarray 0.16.1
from argopy import DataFetcher as ArgoDataFetcher argo_loader = ArgoDataFetcher(mode='expert', src='localftp', local_ftp='/export/home/kbalem/ftp-argo/') wmoId = 6903550 profileNumber = 34 ds = argo_loader.profile(wmoId, profileNumber).to_xarray().to_dataframe()
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_euroargodev_argopy_issues_81-23issuecomment-2D787817001&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=3uTP4cBZd1gJUg53NhXK_P6RLRWG0dHNfcoQ29KG1f4&s=7P2qZadG5hB-X2xeZMyLbK4RRz2x1RqNeItJopoh9dI&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQJ4AMM3MZR42DUW7HZVOCDTBNPXLANCNFSM4YJH2OOA&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=3uTP4cBZd1gJUg53NhXK_P6RLRWG0dHNfcoQ29KG1f4&s=RP8y3J6k0A7JcvaY0J7wEdiDvkUpecf5HLr6AuyXp00&e=.
Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment.
Yes, a profile or float request in expert mode with localftp source should return every variables available. But operations & filtering done for the standard mode are not yet supported for BGC files (hopefully soon).
Hi Kevin,
Yes, a profile or float request in expert mode with localftp source should return every variables available. But operations & filtering done for the standard mode are not yet supported for BGC files (hopefully soon).
Got it. However, in expert mode
argo_loader = ArgoDataFetcher(dataset='bgc', mode='expert', src='localftp', local_ftp='/Users/ericrehm/ftp/outgoing/argo') wmoId = 6903550 profileNumber = 34
I get BGC expert variables (e.g., BETA_BACKSCATTERING700, DOXY_ADJUSTED, CHLA_ADJUSTED, …) for a single profile :
df0 = argo_loader.profile(wmoId, profileNumber).to_xarray().to_dataframe()
But only Core expert variables(e.g., PSAL, PSAL_ADJUSTED, TEMP, etc.) when I try to retrieve all profiles like this using the same fetcher:
df2 = argo_loader.float(wmoId).to_xarray().argo.point2profile().to_dataframe()
Is this a bug or a feature?
- eric
eric rehm ph.d. | Senior Oceanographer | he/him/his Sea-Bird Scientific | www.seabird.comhttp://www.sea-birdscientific.com/ | @.@.> m +1.541.666.9569tel:+1-541-666-9569 Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment.
So, if it wasn’t clear below, I was not able to retrieve BGC expert variables for a “float”, i.e., all profiles.
Is that a feature or bug?
- eric
On Mar 30, 2021, at 12:38 PM, Rehm, Eric @.@.>> wrote:
Hi Kevin,
Yes, a profile or float request in expert mode with localftp source should return every variables available. But operations & filtering done for the standard mode are not yet supported for BGC files (hopefully soon).
Got it. However, in expert mode
argo_loader = ArgoDataFetcher(dataset='bgc', mode='expert', src='localftp', local_ftp='/Users/ericrehm/ftp/outgoing/argo') wmoId = 6903550 profileNumber = 34
I get BGC expert variables (e.g., BETA_BACKSCATTERING700, DOXY_ADJUSTED, CHLA_ADJUSTED, …) for a single profile :
df0 = argo_loader.profile(wmoId, profileNumber).to_xarray().to_dataframe()
But only Core expert variables(e.g., PSAL, PSAL_ADJUSTED, TEMP, etc.) when I try to retrieve all profiles like this using the same fetcher:
df2 = argo_loader.float(wmoId).to_xarray().argo.point2profile().to_dataframe()
Is this a bug or a feature?
- eric
eric rehm ph.d. | Senior Oceanographer | he/him/his Sea-Bird Scientific | www.seabird.comhttp://www.sea-birdscientific.com/ | @.@.> m +1.541.666.9569tel:+1-541-666-9569
Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment.
Hi @SBS-EREHM , Sorry for the late response.
I guess there is a bug here, or a lack of work at least, on the BGC part.
First there's the datasetoption :
when passed inside ArgoDataFetcher, key should be ds='bgc' and not dataset='bgc', but when passed through argopy.set_options(), key is dataset. We need to fix that. You don't see the problem with profile request because it goes through the profile files (those contains all variables), but the float request goes either on multiprofile files ( _prof.nc) for phy or on bgc synth files (_Sprof.nc) for bgc.
But for now, we haven't made much tests/dev on bgc (profile or multiprofile synth) files. And my guess is that even profile files will generate some issues with profile requests with ds='bgc' . So for now, best solution would be what you already does, profile requests in expert mode with ds='phy' and dealing with xarray or pandas after that.
I'm sorry I don't have much time to look at it more these days. We're short on humans :smile:
hi @SBS-EREHM
1- the original issue you raised is indeed related to #77 as pointed out by @quai20
2- I noticed that you used the form [...].to_xarray().to_dataframe(), please note that you can simply use [...].to_dataframe()
3- about the variables returned being different when fetching a profile vs a float data: this is due to the fact that BGC data are not yet fully supported.
When fetching a profile without explicit mention to the bgc dataset with the ds default option (set to phy), argopy fetch data from the first profile file found alphabetically. In your case, this happens to be a BGC file:
>> ArgoDataFetcher(mode='expert', src='localftp').profile(wmoId, profileNumber).uri
['/Users/gmaze/data/ARGO/ftp_current/dac/coriolis/6903550/profiles/BD6903550_034.nc']
If you would explicitly use the bgc option, you would have an error because it's not supported yet.
When fetching data from a float, argopy will fetch data from multi-profile files. Again, without explicit mention to the bgc option, argopy load data from the 6903550_prof.nc file, where there is no BGC variables.
When BGC variables will be fully supported you won't have such differences or weird behavior. As of now, BGC variables are returned in expert mode, mostly by chance ...
I invite you to checkout the uri property of the Argo loader, it will precisely give you where are the data coming from:
ArgoDataFetcher(ds='phy', mode='expert', src='localftp').profile(wmoId, profileNumber).uri
ArgoDataFetcher(ds='bgc', mode='expert', src='localftp').profile(wmoId, profileNumber).uri
ArgoDataFetcher(ds='phy', mode='expert', src='localftp').float(wmoId).uri
ArgoDataFetcher(ds='bgc', mode='expert', src='localftp').float(wmoId).uri
hope this will clarify things for you
Hi @SBS-EREHM If you're still around, just to mention 2 things:
- the
localftpdata source has been deprecated and replaced by thegdacdata source together with aftpoption to specify where you'd like to retrieve GDAC data. - with the v0.1.14 release, argopy now support BGC data retrieval from the
erddap(not yet the gdac), it's all explain in the documentation https://argopy.readthedocs.io/
Closing this because of v0.1.14 new BGC features