atlite
atlite copied to clipboard
Unable to generate ERA5 cutout
Version Checks (indicate both or one)
-
[X] I have confirmed this bug exists on the lastest release of Atlite.
-
[x] I have confirmed this bug exists on the current
masterbranch of Atlite.
Issue Description
While trying to create a simple ERA5 cutout, I get a mysterious error message from deep within xarray. Here is the full log, reproducer below:
INFO:atlite.data:Storing temporary files in /tmp/tmpxp27b64b
INFO:atlite.data:Calculating and writing with module era5:
INFO:atlite.datasets.era5:Requesting data for feature height...
INFO:atlite.datasets.era5:Requesting data for feature wind...
INFO:atlite.datasets.era5:Requesting data for feature temperature...
INFO:atlite.datasets.era5:Requesting data for feature runoff...
INFO:atlite.datasets.era5:Requesting data for feature influx...
2024-09-20 11:50:28,916 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
2024-09-20 11:50:28,953 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
2024-09-20 11:50:28,957 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
2024-09-20 11:50:28,962 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
2024-09-20 11:50:28,967 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
INFO:atlite.datasets.era5:CDS: Downloading variables
geopotential (2020-1)
INFO:atlite.datasets.era5:CDS: Downloading variables
runoff (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
INFO:atlite.datasets.era5:CDS: Downloading variables
2m_temperature (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
soil_temperature_level_4 (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
2m_dewpoint_temperature (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
INFO:atlite.datasets.era5:CDS: Downloading variables
surface_net_solar_radiation (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
surface_solar_radiation_downwards (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
toa_incident_solar_radiation (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
total_sky_direct_solar_radiation_at_surface (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
INFO:atlite.datasets.era5:CDS: Downloading variables
100m_u_component_of_wind (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
100m_v_component_of_wind (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
forecast_surface_roughness (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
0%| | 0.00/16.1M [00:00<?, ?B/s]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite
/data.py", line 116, in wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/data.py", line 208, in cutout_prepare
ds = get_features(
^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/data.py", line 59, in get_features
datasets = compute(*datasets)
^^^^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/dask/base.py", line 660, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/datasets/era5.py", line 468, in get_data
return xr.concat(datasets, dim="time").sel(time=coords["time"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/xarray/core/concat.py", line 254, in concat
first_obj, objs = utils.peek_at(objs)
^^^^^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/xarray/core/utils.py", line 200, in peek_at
peek = next(gen)
^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/datasets/era5.py", line 453, in retrieve_once
ds = func({**retrieval_params, **time})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/datasets/era5.py", line 197, in get_data_influx
sp = SolarPosition(ds, time_shift=time_shift)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/pv/solar_position.py", line 84, in SolarPosition
n = n.chunk(chunks)
^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py", line 118, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1442, in chunk
chunk_mapping = dict(zip(self.dims, chunks, strict=True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zip() argument 2 is longer than argument 1
Reproducible Example
import logging
import atlite
logging.basicConfig(level=logging.INFO)
cutout = atlite.Cutout(
"czechia-2020",
module="era5",
x=slice(12, 19),
y=slice(48.5, 51.2),
chunks={"time": 100},
time="2020",
)
cutout.prepare()
Expected Behavior
I expect the ERA5 data to be downloaded correctly and the cutout to be generated along with with its prepared features.
Installed Versions
affine==2.4.0 atlite==0.2.14 attrs==24.2.0 beautifulsoup4==4.12.3 Bottleneck==1.4.0 cads-api-client==1.3.3 Cartopy==0.23.0 cdsapi==0.7.3 certifi==2024.8.30 cftime==1.6.4 charset-normalizer==3.3.2 click==8.1.7 click-plugins==1.1.1 cligj==0.7.2 cloudpickle==3.0.0 contourpy==1.3.0 country-converter==1.2 cycler==0.12.1 dask==2024.9.0 deprecation==2.1.0 entsoe-py==0.6.8 et-xmlfile==1.1.0 fonttools==4.53.1 fsspec==2024.9.0 geographiclib==2.0 geopandas==1.0.1 geopy==2.4.1 idna==3.10 kiwisolver==1.4.7 locket==1.0.0 matplotlib==3.9.2 multiurl==0.3.1 netCDF4==1.7.1.post2 networkx==3.3 numexpr==2.10.1 numpy==1.26.4 openpyxl==3.1.5 packaging==24.1 pandas==2.2.2 partd==1.4.2 pgeocode==0.5.0 pillow==10.4.0 powerplantmatching==0.6.0 progressbar2==4.5.0 pycountry==24.6.1 pyogrio==0.9.0 pyparsing==3.1.4 pyproj==3.6.1 pyshp==2.3.1 python-dateutil==2.9.0.post0 python-utils==3.8.2 pytz==2024.2 PyYAML==6.0.2 rasterio==1.3.11 requests==2.32.3 scipy==1.14.1 seaborn==0.13.2 setuptools==75.1.0 shapely==2.0.6 six==1.16.0 snuggs==1.4.7 soupsieve==2.6 toolz==0.12.1 tqdm==4.66.5 typing_extensions==4.12.2 tzdata==2024.1 Unidecode==1.3.8 urllib3==2.2.3 xarray==2024.9.0 xlrd==2.0.1
I'm experiencing the same issue when attempting to create the example tutorial (https://atlite.readthedocs.io/en/master/examples/create_cutout.html)
7 cutout = atlite.Cutout(
8 path="western-europe-2011-01.nc",
9 module="era5",
(...) 13 chunks={'time': 'auto', 'x': 'auto', 'y': 'auto'} 14 ) ---> 17 cutout.prepare(features=['influx'])
File /X/miniconda3/envs/pypsa-ties/lib/python3.12/site-packages/atlite/data.py:116, in maybe_remove_tmpdir.
File /X/envs/pypsa-ties/lib/python3.12/site-packages/atlite/data.py:208, in cutout_prepare(cutout, features, tmpdir, overwrite, compression, show_progress, dask_kwargs, monthly_requests, concurrent_requests) 206 logger.info(f"Calculating and writing with module {module}:") 207 missing_features = missing_vars.index.unique("feature") --> 208 ds = get_features( 209 cutout, 210 module, ... -> 1442 chunk_mapping = dict(zip(self.dims, chunks, strict=True)) 1443 else: 1444 chunk_mapping = either_dict_or_kwargs(chunks, chunks_kwargs, "chunk")
ValueError: zip() argument 2 is longer than argument 1
I also tried specifically assigning chunks. Maybe it has something to do with the new API system implemented from CDS ?
I also have this issue. lat/lon below are a ~100km box around ~39.26, -103.69.
cutout = atlite.Cutout(
path="limon_co_2024_local.nc",
module="era5",
x=slice(min_lon, max_lon),
y=slice(min_lat, max_lat),
time="2023",
dt='h'
)
cutout.prepare()
Also arising from SolarPosition:
File "/Users/ehrenfoss/Documents/emissionality/venv/lib/python3.12/site-packages/atlite/datasets/era5.py", line 197, in get_data_influx
sp = SolarPosition(ds, time_shift=time_shift)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ehrenfoss/Documents/emissionality/venv/lib/python3.12/site-packages/atlite/pv/solar_position.py", line 84, in SolarPosition
n = n.chunk(chunks)
^^^^^^^^^^^^^^^
File "/Users/ehrenfoss/Documents/emissionality/venv/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py", line 118, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/ehrenfoss/Documents/emissionality/venv/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1442, in chunk
chunk_mapping = dict(zip(self.dims, chunks, strict=True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zip() argument 2 is longer than argument 1
2024-10-06 16:44:20,037 INFO status has been updated to successful
INFO:cads_api_client.processing:status has been updated to successful
I encountered the same problem. A workaround for me was downgrading xarray to v2024.07.
I encountered the same problem. I tried bypassing the issue on my end with no success, although I made the following observations:
- The error seems weirdly inconsistent. I had no issues downloading irradiation data for the Netherlands for 2008, but irradiation data for France would fail due to this bug if I attempted to download 3 or more months.
- Passing
monthly_requests=Truetocutout.prepare()allowed me to get a full weather year for France. However, this didn't seem like a reliable bypass, as it still wouldn't allow me to download even a month of irradiation data for the whole of continental Europe.
I'm trying to see if the bypass by @oruhnau would work for me as well, but testing is painfully slow since the only way I've found to reliably reproduce this bug is to request sufficiently large data from CDS, and the queue takes time 😅I tried saving the ERA5 temp files to test this locally, but I didn't manage to get the error to occur this way when calculating the SolarPosition based on them.
EDIT: It would seem that following the advice of @oruhnau also works in my case. Thanks!
Hey all!
I couldn't reproduce this. Just to be sure, since the xarray switch shouldn't help in that case: Have you switched to the new CDS Api? The old one is decommissioned since 26.9.
See
@lkstrp I'm on the new CDS API at least (cdsapi 0.7.2), I thought the old one couldn't even be used anymore (threw errors instructing me to update) 🤔
Interestingly, the code in OP works for me now. I had changed the API URL to https://cds-beta.climate.copernicus.eu/api before, but could still reproduce the issue – it seems to be working now.
There is now also cdsapi 0.7.4, which fixes the bug that caused 0.7.3 not to work. But just FYI, I don't think this is related to the problem here.
And if anyone has any other code snippets that I can try to reproduce the issue, feel free to share it
Example script
@lkstrp Here's a script I threw together that runs into this issue based on the create_cutout.ipynb example:
import atlite
import logging
logging.basicConfig(level=logging.INFO)
cutout = atlite.Cutout(
path="western-europe-2011-0103.nc",
module="era5",
x=slice(-13.6913, 1.7712),
y=slice(49.9096, 60.8479),
time=slice("2011-01", "2011-03"),
)
cutout.prepare(features=["influx"])
Note that the error only occurs when downloading 3 or more months of influx data, e.g. time=slice("2011-01", "2011-02") completes successfully. I don't know if this can vary between computers, but in my experience, the likelihood of this error occurring increases the larger the dataset being downloaded is (making this super annoying to test).
Traceback
Here's the traceback I get:
Traceback (most recent call last):
File "C:\atlite\atlite\data.py", line 116, in wrapper
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\atlite\atlite\data.py", line 208, in cutout_prepare
ds = get_features(
^^^^^^^^^^^^^
File "C:\atlite\atlite\data.py", line 59, in get_features
datasets = compute(*datasets)
^^^^^^^^^^^^^^^^^^
File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\dask\base.py", line 660, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\atlite\atlite\datasets\era5.py", line 469, in get_data
return xr.concat(datasets, dim="time").sel(time=coords["time"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\xarray\core\concat.py", line 254, in concat
first_obj, objs = utils.peek_at(objs)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\xarray\core\utils.py", line 201, in peek_at
peek = next(gen)
^^^^^^^^^
File "C:\atlite\atlite\datasets\era5.py", line 454, in retrieve_once
ds = func({**retrieval_params, **time})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\atlite\atlite\datasets\era5.py", line 198, in get_data_influx
sp = SolarPosition(ds, time_shift=time_shift)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\atlite\atlite\pv\solar_position.py", line 81, in SolarPosition
n = n.chunk(chunks)
^^^^^^^^^^^^^^^
File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\xarray\util\deprecation_helpers.py", line 118, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\xarray\core\dataarray.py", line 1447, in chunk
chunk_mapping = dict(zip(self.dims, chunks, strict=True))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zip() argument 2 is longer than argument 1
There's also a follow-up exception that occurs when atlite tries to get rid of the downloaded temp files after the above error occurs (which naturally goes away if e.g. the tmpdir keyword argument is given):
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\atlite\cutout_test.py", line 12, in <module>
cutout.prepare(features=["influx"])
File "C:\atlite\atlite\data.py", line 118, in wrapper
rmtree(kwargs["tmpdir"])
File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\shutil.py", line 781, in rmtree
return _rmtree_unsafe(path, onexc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\shutil.py", line 635, in _rmtree_unsafe
onexc(os.unlink, fullname, err)
File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\shutil.py", line 633, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\trtopi\\AppData\\Local\\Temp\\3\\tmpdksooror\\tmpqkgowfv1.nc'
My setup
I'm running a Windows 11 machine with miniconda, and set up a new conda environment solely for this test with Python 3.12.7 and pip install atlite:
Package Version
------------------ -----------
affine 2.4.0
atlite 0.2.14
attrs 24.2.0
Bottleneck 1.4.2
cads-api-client 1.4.7
cdsapi 0.7.4
certifi 2024.8.30
cftime 1.6.4.post1
charset-normalizer 3.4.0
click 8.1.7
click-plugins 1.1.1
cligj 0.7.2
cloudpickle 3.1.0
colorama 0.4.6
dask 2024.10.0
fsspec 2024.10.0
geopandas 1.0.1
idna 3.10
locket 1.0.0
multiurl 0.3.2
netCDF4 1.7.2
numexpr 2.10.1
numpy 1.26.4
packaging 24.1
pandas 2.2.3
partd 1.4.2
pip 24.2
progressbar2 4.5.0
pyogrio 0.10.0
pyparsing 3.2.0
pyproj 3.7.0
python-dateutil 2.9.0.post0
python-utils 3.9.0
pytz 2024.2
PyYAML 6.0.2
rasterio 1.4.1
requests 2.32.3
scipy 1.14.1
setuptools 75.1.0
shapely 2.0.6
six 1.16.0
toolz 1.0.0
tqdm 4.66.5
typing_extensions 4.12.2
tzdata 2024.2
urllib3 2.2.3
wheel 0.44.0
xarray 2024.10.0
So far, following the workaround by @oruhnau has worked for me, no idea why:
I encountered the same problem. A workaround for me was downgrading xarray to v2024.07.
Thanks for the detailed report @Tasqu! It made it easy to track down.
The problem occurs during some rechunking where the dimension lengths did not match. And version 2024.09 of xarray introduced stricter handling for those cases where things broke.
A simple fix is done in #395.