TopoPyScale icon indicating copy to clipboard operation
TopoPyScale copied to clipboard

CDS_Beta issues

Open joelfiddes opened this issue 1 year ago • 11 comments

Old CDS will be fully decommissioned 26 Sept and new netcdf have a whole bunch of format/variable name changes - sheeeeet.

Seems grib is more stable and also consistent with ECMWF open data forecast I am working with.

I propose: - download grib - cdo convert to netcdf - deal with any variable mappings inconsistencies

I have implemented a new keyword in config for output format (defaults to netcdf if none present). For now I will work in branch "cdsbeta"

joelfiddes avatar Sep 12 '24 08:09 joelfiddes

argfff so annoying..i I hope it will improve performance on their side though .. But sure use the path a least effort in implementation. it will add one step on our side but right now it better works than not. Eventually we may fallback on downloading netcdf.

Have they changed not only variable names but units and the type of variable? Would you have a sample file? It is super easy with xarray to change variable and dimension names on the fly.

ArcticSnow avatar Sep 13 '24 07:09 ArcticSnow

Im actually currently using the netcdf still as in end the grib netcdf conversion required loads of extra dependencies. variable names and order of dimensions easily done I just did a stupid renaming and rewriting so as not to mess with topo_scale module. This is a preprocessing step in fetch_era5

joelfiddes avatar Sep 13 '24 07:09 joelfiddes

but now i have a dtype issue - all variables are float32. and I have an einsum problem, will post here

joelfiddes avatar Sep 13 '24 07:09 joelfiddes

ive pinned it down to l.100 of topo_scale

plev_interp = dw.sum(['longitude', 'latitude'], keep_attrs=True) # compute horizontal inverse weighted horizontal interpolation

Traceback (most recent call last): File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-16-f1e94b7789da>", line 2, in <module> plev_interp = dw.sum(['longitude', 'latitude'], File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/weighted.py", line 476, in sum return self._implementation( File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/weighted.py", line 543, in _implementation return self.obj.map(func, dim=dim, **kwargs) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/dataset.py", line 6026, in map variables = { File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/dataset.py", line 6027, in <dictcomp> k: maybe_wrap_array(v, func(v, *args, **kwargs)) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/weighted.py", line 274, in _weighted_sum return self._reduce(da, self.weights, dim=dim, skipna=skipna) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/weighted.py", line 229, in _reduce return dot(da, weights, dims=dim) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/computation.py", line 1762, in dot result = apply_ufunc( File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/computation.py", line 1197, in apply_ufunc return apply_dataarray_vfunc( File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/computation.py", line 304, in apply_dataarray_vfunc result_var = func(*data_vars) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/computation.py", line 761, in apply_variable_ufunc result_data = func(*input_data) File "<_array_function_ internals>", line 200, in einsum File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/numpy/core/einsumfunc.py", line 1371, in einsum return c_einsum(*operands, **kwargs) TypeError: invalid data type for einsum dw = xr.Dataset.weighted(ds_plev_pt, da_idw) dw.sum Out[18]: <bound method Weighted.sum of DatasetWeighted with weights along dimensions: latitude, longitude> dw Out[19]: DatasetWeighted with weights along dimensions: latitude, longitude ds_plev_pt.dtype Traceback (most recent call last): File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-20-0543fae50593>", line 1, in <module> ds_plev_pt.dtype File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/common.py", line 278, in _getattr_ raise AttributeError( AttributeError: 'Dataset' object has no attribute 'dtype' print(ds_plev_pt.dtype, da_idw.dtype) Traceback (most recent call last): File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3505, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-21-b2073b75da5d>", line 1, in <module> print(ds_plev_pt.dtype, da_idw.dtype) File "/home/joel/anaconda3/envs/downscaling/lib/python3.9/site-packages/xarray/core/common.py", line 278, in _getattr_ raise AttributeError( AttributeError: 'Dataset' object has no attribute 'dtype' for var_name, var_data in ds_plev_pt.data_vars.items(): print(f"{var_name}: {var_data.dtype}") number: int64 expver: object z: float32 t: float32 u: float32 v: float32 r: float32 q: float32 print(da_idw.dtype) float64

joelfiddes avatar Sep 13 '24 07:09 joelfiddes

fixed with conversion to float64:

ds_plev_pt = ds_plev_pt.astype('float64') da_idw = da_idw.astype('float64')

l.100 now runs:

dw = xr.Dataset.weighted(ds_plev_pt, da_idw)

joelfiddes avatar Sep 13 '24 08:09 joelfiddes

Does it sound ok to you?

chatgpt said:

float32 and einsum ChatGPT said: ChatGPT It looks like the data type float32 is being used in your xarray.Dataset (ds_plev_pt), and it's causing issues when calling numpy.einsum. The einsum function often works better with float64, as it expects higher precision and can sometimes raise errors when dealing with lower precision types like float32.

To resolve this, you can try converting your data to float64 before passing it to einsum. Here's how you can do that:

Convert xarray.Dataset or xarray.DataArray to float64:

python Copy code ds_plev_pt = ds_plev_pt.astype('float64') da_idw = da_idw.astype('float64') Proceed with the operation: After converting both the Dataset and DataArray to float64, try running the code again to see if the error persists.

This should prevent the TypeError in einsum and allow the computation to proceed with higher precision.

joelfiddes avatar Sep 13 '24 08:09 joelfiddes

will pin a full description of adaption to CDS-Beta here once finished.....

joelfiddes avatar Sep 13 '24 08:09 joelfiddes

no problem converting to float64. it uses more memory, but if required by numpy, it is required... ChatGPT propose what I would have too. ;)

or it may be written as ds_plev_pt.astype(np.float64) I think

ArcticSnow avatar Sep 13 '24 08:09 ArcticSnow

CDS-beta code is still a branch will merge when fully tested - currently code will not work as CDS-legacy switched off

joelfiddes avatar Sep 27 '24 14:09 joelfiddes

@joelfiddes it seems, there is a possibility to access the ERA5 archive via the ECMWF-MARS server via ssh.

https://www.ecmwf.int/en/computing/access-computing-facilities/how-log https://confluence.ecmwf.int/display/WEBAPI/Access+MARS

ArcticSnow avatar Dec 11 '24 09:12 ArcticSnow

@joelfiddes , once you're happy with the changes for fetching data from the new system, we can push a pypi version. This relates too to #113 issue.

ArcticSnow avatar Jan 13 '25 10:01 ArcticSnow

solved with version v0.2.8: https://github.com/ArcticSnow/TopoPyScale/commit/8f3ec6ea9113449f5e8fdc8d4bd0776d418767f3

ArcticSnow avatar May 26 '25 09:05 ArcticSnow