atlite icon indicating copy to clipboard operation
atlite copied to clipboard

Unable to generate ERA5 cutout

Open mgrabovsky opened this issue 1 year ago • 4 comments

Version Checks (indicate both or one)

  • [X] I have confirmed this bug exists on the lastest release of Atlite.

  • [x] I have confirmed this bug exists on the current master branch of Atlite.

Issue Description

While trying to create a simple ERA5 cutout, I get a mysterious error message from deep within xarray. Here is the full log, reproducer below:

INFO:atlite.data:Storing temporary files in /tmp/tmpxp27b64b
INFO:atlite.data:Calculating and writing with module era5:
INFO:atlite.datasets.era5:Requesting data for feature height...
INFO:atlite.datasets.era5:Requesting data for feature wind...
INFO:atlite.datasets.era5:Requesting data for feature temperature...
INFO:atlite.datasets.era5:Requesting data for feature runoff...
INFO:atlite.datasets.era5:Requesting data for feature influx...
2024-09-20 11:50:28,916 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
2024-09-20 11:50:28,953 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
2024-09-20 11:50:28,957 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
2024-09-20 11:50:28,962 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
2024-09-20 11:50:28,967 WARNING MOVE TO CDS-Beta
WARNING:cdsapi:MOVE TO CDS-Beta
INFO:atlite.datasets.era5:CDS: Downloading variables
        geopotential (2020-1)

INFO:atlite.datasets.era5:CDS: Downloading variables                                                              
        runoff (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

INFO:atlite.datasets.era5:CDS: Downloading variables                                                              
        2m_temperature (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
        soil_temperature_level_4 (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
        2m_dewpoint_temperature (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

INFO:atlite.datasets.era5:CDS: Downloading variables                                                              
        surface_net_solar_radiation (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
        surface_solar_radiation_downwards (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
        toa_incident_solar_radiation (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
        total_sky_direct_solar_radiation_at_surface (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

INFO:atlite.datasets.era5:CDS: Downloading variables                                                              
        100m_u_component_of_wind (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
        100m_v_component_of_wind (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
        forecast_surface_roughness (2020-[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

  0%|                                                                                 | 0.00/16.1M [00:00<?, ?B/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite
/data.py", line 116, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/data.py", line 208, in cutout_prepare
    ds = get_features(
         ^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/data.py", line 59, in get_features
    datasets = compute(*datasets)
               ^^^^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/dask/base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/datasets/era5.py", line 468, in get_data
    return xr.concat(datasets, dim="time").sel(time=coords["time"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/xarray/core/concat.py", line 254, in concat
    first_obj, objs = utils.peek_at(objs)
                      ^^^^^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/xarray/core/utils.py", line 200, in peek_at
    peek = next(gen)
           ^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/datasets/era5.py", line 453, in retrieve_once
    ds = func({**retrieval_params, **time})
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/datasets/era5.py", line 197, in get_data_influx
    sp = SolarPosition(ds, time_shift=time_shift)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/atlite/pv/solar_position.py", line 84, in SolarPosition
    n = n.chunk(chunks)
        ^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py", line 118, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mgrabovsky/wind/env/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1442, in chunk
    chunk_mapping = dict(zip(self.dims, chunks, strict=True))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zip() argument 2 is longer than argument 1

Reproducible Example

import logging

import atlite

logging.basicConfig(level=logging.INFO)

cutout = atlite.Cutout(
    "czechia-2020",
    module="era5",
    x=slice(12, 19),
    y=slice(48.5, 51.2),
    chunks={"time": 100},
    time="2020",
)
cutout.prepare()

Expected Behavior

I expect the ERA5 data to be downloaded correctly and the cutout to be generated along with with its prepared features.

Installed Versions

affine==2.4.0 atlite==0.2.14 attrs==24.2.0 beautifulsoup4==4.12.3 Bottleneck==1.4.0 cads-api-client==1.3.3 Cartopy==0.23.0 cdsapi==0.7.3 certifi==2024.8.30 cftime==1.6.4 charset-normalizer==3.3.2 click==8.1.7 click-plugins==1.1.1 cligj==0.7.2 cloudpickle==3.0.0 contourpy==1.3.0 country-converter==1.2 cycler==0.12.1 dask==2024.9.0 deprecation==2.1.0 entsoe-py==0.6.8 et-xmlfile==1.1.0 fonttools==4.53.1 fsspec==2024.9.0 geographiclib==2.0 geopandas==1.0.1 geopy==2.4.1 idna==3.10 kiwisolver==1.4.7 locket==1.0.0 matplotlib==3.9.2 multiurl==0.3.1 netCDF4==1.7.1.post2 networkx==3.3 numexpr==2.10.1 numpy==1.26.4 openpyxl==3.1.5 packaging==24.1 pandas==2.2.2 partd==1.4.2 pgeocode==0.5.0 pillow==10.4.0 powerplantmatching==0.6.0 progressbar2==4.5.0 pycountry==24.6.1 pyogrio==0.9.0 pyparsing==3.1.4 pyproj==3.6.1 pyshp==2.3.1 python-dateutil==2.9.0.post0 python-utils==3.8.2 pytz==2024.2 PyYAML==6.0.2 rasterio==1.3.11 requests==2.32.3 scipy==1.14.1 seaborn==0.13.2 setuptools==75.1.0 shapely==2.0.6 six==1.16.0 snuggs==1.4.7 soupsieve==2.6 toolz==0.12.1 tqdm==4.66.5 typing_extensions==4.12.2 tzdata==2024.1 Unidecode==1.3.8 urllib3==2.2.3 xarray==2024.9.0 xlrd==2.0.1

mgrabovsky avatar Sep 20 '24 09:09 mgrabovsky

I'm experiencing the same issue when attempting to create the example tutorial (https://atlite.readthedocs.io/en/master/examples/create_cutout.html)

  7 cutout = atlite.Cutout(
  8     path="western-europe-2011-01.nc",
  9     module="era5",

(...) 13 chunks={'time': 'auto', 'x': 'auto', 'y': 'auto'} 14 ) ---> 17 cutout.prepare(features=['influx'])

File /X/miniconda3/envs/pypsa-ties/lib/python3.12/site-packages/atlite/data.py:116, in maybe_remove_tmpdir..wrapper(*args, **kwargs) 114 kwargs["tmpdir"] = mkdtemp() 115 try: --> 116 res = func(*args, **kwargs) 117 finally: 118 rmtree(kwargs["tmpdir"])

File /X/envs/pypsa-ties/lib/python3.12/site-packages/atlite/data.py:208, in cutout_prepare(cutout, features, tmpdir, overwrite, compression, show_progress, dask_kwargs, monthly_requests, concurrent_requests) 206 logger.info(f"Calculating and writing with module {module}:") 207 missing_features = missing_vars.index.unique("feature") --> 208 ds = get_features( 209 cutout, 210 module, ... -> 1442 chunk_mapping = dict(zip(self.dims, chunks, strict=True)) 1443 else: 1444 chunk_mapping = either_dict_or_kwargs(chunks, chunks_kwargs, "chunk")

ValueError: zip() argument 2 is longer than argument 1

I also tried specifically assigning chunks. Maybe it has something to do with the new API system implemented from CDS ?

DemandFlex avatar Oct 01 '24 13:10 DemandFlex

I also have this issue. lat/lon below are a ~100km box around ~39.26, -103.69.

cutout = atlite.Cutout(
    path="limon_co_2024_local.nc",
    module="era5",
    x=slice(min_lon, max_lon),
    y=slice(min_lat, max_lat),
    time="2023",
    dt='h'
)
cutout.prepare()

Also arising from SolarPosition:

  File "/Users/ehrenfoss/Documents/emissionality/venv/lib/python3.12/site-packages/atlite/datasets/era5.py", line 197, in get_data_influx
    sp = SolarPosition(ds, time_shift=time_shift)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ehrenfoss/Documents/emissionality/venv/lib/python3.12/site-packages/atlite/pv/solar_position.py", line 84, in SolarPosition
    n = n.chunk(chunks)
        ^^^^^^^^^^^^^^^
  File "/Users/ehrenfoss/Documents/emissionality/venv/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py", line 118, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ehrenfoss/Documents/emissionality/venv/lib/python3.12/site-packages/xarray/core/dataarray.py", line 1442, in chunk
    chunk_mapping = dict(zip(self.dims, chunks, strict=True))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zip() argument 2 is longer than argument 1
2024-10-06 16:44:20,037 INFO status has been updated to successful                                                                                                                                          
INFO:cads_api_client.processing:status has been updated to successful

ehrenfoss avatar Oct 07 '24 20:10 ehrenfoss

I encountered the same problem. A workaround for me was downgrading xarray to v2024.07.

oruhnau avatar Oct 18 '24 11:10 oruhnau

I encountered the same problem. I tried bypassing the issue on my end with no success, although I made the following observations:

  1. The error seems weirdly inconsistent. I had no issues downloading irradiation data for the Netherlands for 2008, but irradiation data for France would fail due to this bug if I attempted to download 3 or more months.
  2. Passing monthly_requests=True to cutout.prepare() allowed me to get a full weather year for France. However, this didn't seem like a reliable bypass, as it still wouldn't allow me to download even a month of irradiation data for the whole of continental Europe.

I'm trying to see if the bypass by @oruhnau would work for me as well, but testing is painfully slow since the only way I've found to reliably reproduce this bug is to request sufficiently large data from CDS, and the queue takes time 😅I tried saving the ERA5 temp files to test this locally, but I didn't manage to get the error to occur this way when calculating the SolarPosition based on them.

EDIT: It would seem that following the advice of @oruhnau also works in my case. Thanks!

Tasqu avatar Oct 21 '24 14:10 Tasqu

Hey all!

I couldn't reproduce this. Just to be sure, since the xarray switch shouldn't help in that case: Have you switched to the new CDS Api? The old one is decommissioned since 26.9. See

lkstrp avatar Oct 24 '24 15:10 lkstrp

@lkstrp I'm on the new CDS API at least (cdsapi 0.7.2), I thought the old one couldn't even be used anymore (threw errors instructing me to update) 🤔

Tasqu avatar Oct 25 '24 04:10 Tasqu

Interestingly, the code in OP works for me now. I had changed the API URL to https://cds-beta.climate.copernicus.eu/api before, but could still reproduce the issue – it seems to be working now.

mgrabovsky avatar Oct 25 '24 11:10 mgrabovsky

There is now also cdsapi 0.7.4, which fixes the bug that caused 0.7.3 not to work. But just FYI, I don't think this is related to the problem here.

And if anyone has any other code snippets that I can try to reproduce the issue, feel free to share it

lkstrp avatar Oct 25 '24 13:10 lkstrp

Example script

@lkstrp Here's a script I threw together that runs into this issue based on the create_cutout.ipynb example:

import atlite
import logging
logging.basicConfig(level=logging.INFO)

cutout = atlite.Cutout(
    path="western-europe-2011-0103.nc",
    module="era5",
    x=slice(-13.6913, 1.7712),
    y=slice(49.9096, 60.8479),
    time=slice("2011-01", "2011-03"),
)
cutout.prepare(features=["influx"])

Note that the error only occurs when downloading 3 or more months of influx data, e.g. time=slice("2011-01", "2011-02") completes successfully. I don't know if this can vary between computers, but in my experience, the likelihood of this error occurring increases the larger the dataset being downloaded is (making this super annoying to test).

Traceback

Here's the traceback I get:

Traceback (most recent call last):
  File "C:\atlite\atlite\data.py", line 116, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\atlite\atlite\data.py", line 208, in cutout_prepare
    ds = get_features(
         ^^^^^^^^^^^^^
  File "C:\atlite\atlite\data.py", line 59, in get_features
    datasets = compute(*datasets)
               ^^^^^^^^^^^^^^^^^^
  File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\dask\base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\atlite\atlite\datasets\era5.py", line 469, in get_data
    return xr.concat(datasets, dim="time").sel(time=coords["time"])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\xarray\core\concat.py", line 254, in concat
    first_obj, objs = utils.peek_at(objs)
                      ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\xarray\core\utils.py", line 201, in peek_at
    peek = next(gen)
           ^^^^^^^^^
  File "C:\atlite\atlite\datasets\era5.py", line 454, in retrieve_once
    ds = func({**retrieval_params, **time})
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\atlite\atlite\datasets\era5.py", line 198, in get_data_influx
    sp = SolarPosition(ds, time_shift=time_shift)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\atlite\atlite\pv\solar_position.py", line 81, in SolarPosition
    n = n.chunk(chunks)
        ^^^^^^^^^^^^^^^
  File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\xarray\util\deprecation_helpers.py", line 118, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\site-packages\xarray\core\dataarray.py", line 1447, in chunk
    chunk_mapping = dict(zip(self.dims, chunks, strict=True))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: zip() argument 2 is longer than argument 1

There's also a follow-up exception that occurs when atlite tries to get rid of the downloaded temp files after the above error occurs (which naturally goes away if e.g. the tmpdir keyword argument is given):

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\atlite\cutout_test.py", line 12, in <module>
    cutout.prepare(features=["influx"])
  File "C:\atlite\atlite\data.py", line 118, in wrapper
    rmtree(kwargs["tmpdir"])
  File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\shutil.py", line 781, in rmtree
    return _rmtree_unsafe(path, onexc)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\shutil.py", line 635, in _rmtree_unsafe
    onexc(os.unlink, fullname, err)
  File "C:\Users\trtopi\AppData\Local\miniconda3\envs\atlite\Lib\shutil.py", line 633, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\trtopi\\AppData\\Local\\Temp\\3\\tmpdksooror\\tmpqkgowfv1.nc'

My setup

I'm running a Windows 11 machine with miniconda, and set up a new conda environment solely for this test with Python 3.12.7 and pip install atlite:

Package            Version
------------------ -----------
affine             2.4.0
atlite             0.2.14
attrs              24.2.0
Bottleneck         1.4.2
cads-api-client    1.4.7
cdsapi             0.7.4
certifi            2024.8.30
cftime             1.6.4.post1
charset-normalizer 3.4.0
click              8.1.7
click-plugins      1.1.1
cligj              0.7.2
cloudpickle        3.1.0
colorama           0.4.6
dask               2024.10.0
fsspec             2024.10.0
geopandas          1.0.1
idna               3.10
locket             1.0.0
multiurl           0.3.2
netCDF4            1.7.2
numexpr            2.10.1
numpy              1.26.4
packaging          24.1
pandas             2.2.3
partd              1.4.2
pip                24.2
progressbar2       4.5.0
pyogrio            0.10.0
pyparsing          3.2.0
pyproj             3.7.0
python-dateutil    2.9.0.post0
python-utils       3.9.0
pytz               2024.2
PyYAML             6.0.2
rasterio           1.4.1
requests           2.32.3
scipy              1.14.1
setuptools         75.1.0
shapely            2.0.6
six                1.16.0
toolz              1.0.0
tqdm               4.66.5
typing_extensions  4.12.2
tzdata             2024.2
urllib3            2.2.3
wheel              0.44.0
xarray             2024.10.0

So far, following the workaround by @oruhnau has worked for me, no idea why:

I encountered the same problem. A workaround for me was downgrading xarray to v2024.07.

Tasqu avatar Oct 28 '24 12:10 Tasqu

Thanks for the detailed report @Tasqu! It made it easy to track down.

The problem occurs during some rechunking where the dimension lengths did not match. And version 2024.09 of xarray introduced stricter handling for those cases where things broke.

A simple fix is done in #395.

lkstrp avatar Oct 28 '24 14:10 lkstrp