atlite icon indicating copy to clipboard operation
atlite copied to clipboard

wave energy converters and new wind and wave data modules

Open lmezilis opened this issue 1 month ago • 20 comments

Closes # (if applicable).

Changes proposed in this Pull Request

Checklist

  • [x] Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in doc.
  • [x] Unit tests for new features were added (if applicable).
  • [ ] Newly introduced dependencies are added to environment.yaml, environment_docs.yaml and setup.py (if applicable).
  • [ ] A note for the release notes doc/release_notes.rst of the upcoming release is included.
  • [x] I consent to the release of this PR's code under the MIT license.

lmezilis avatar Nov 05 '25 12:11 lmezilis

Thanks a lot for this nice feature!

Can you let us know when you are ready for us to review it? It would also be nice to have a short example, e.g. a ipynb to include into the documentation.

@brynpickering Can I ask you to review it if you have capacity? Thanks!

euronion avatar Nov 06 '25 09:11 euronion

Thank you!

I think we can review it right away, and I can prepare some files for documentation. Apologies for being very new to github procedures. I am slowly starting to get the hang of it.

lmezilis avatar Nov 06 '25 09:11 lmezilis

[...] Apologies for being very new to github procedures. I am slowly starting to get the hang of it.

No worries, good that you mention it! We'll help you get settled in and don't hesitate to ask questions if something is unclear or to ping us (e.g. using @euronion or @brynpickering).

euronion avatar Nov 06 '25 09:11 euronion

Hello @brynpickering, I have made all of the changes locally, should I commit changes in the forked branch or is there another way to continue?

lmezilis avatar Nov 07 '25 14:11 lmezilis

Hello @brynpickering, I have made all of the changes locally, should I commit changes in the forked branch or is there another way to continue?

Yes, in the forked branch. You should be able to just always work in the forked branch and push to your own repository ("origin") whenever you make changes. Those changes will then be made visible in this PR

brynpickering avatar Nov 12 '25 14:11 brynpickering

@euronion not sure what the system should be for accessing the data. This PR works on the basis that the user downloads the data themselves. The data is available via OpenDAP so it would be feasible to query them directly with the appropriate lat/lon/time attrs, e.g.:

import xarray as xr
years = [2000]
months=[1, 2, 3, 4, 5, 6]
remote_data = xr.open_mfdataset(
    [f"https://opendap.4tu.nl/thredds/dodsC/data2/djht/f359cd0f-d135-416c-9118-e79dccba57b9/1/{year}/TU-MREL_EU_ATL-2M_{year}{month:02}.nc?hs,latitude,longitude" for year in years for month in months],
    engine="netcdf4"
)
# `time` coord seems to be corrupted in the data source, so we have to translate integers to datetime locally
remote_data.coords["time"] = pd.date_range(f"{years[0]}-{months[0]:02}", f"{years[-1]}-{months[-1]:02}", freq="1H", inclusive="left")
remote_data.sel(latitude=y, longitude=x, time=time)

brynpickering avatar Nov 12 '25 17:11 brynpickering

@euronion not sure what the system should be for accessing the data. This PR works on the basis that the user downloads the data themselves. The data is available via OpenDAP so it would be feasible to query them directly with the appropriate lat/lon/time attrs, e.g.:

import xarray as xr
years = [2000]
months=[1, 2, 3, 4, 5, 6]
remote_data = xr.open_mfdataset(
    [f"https://opendap.4tu.nl/thredds/dodsC/data2/djht/f359cd0f-d135-416c-9118-e79dccba57b9/1/{year}/TU-MREL_EU_ATL-2M_{year}{month:02}.nc?hs,latitude,longitude" for year in years for month in months],
    engine="netcdf4"
)
# `time` coord seems to be corrupted in the data source, so we have to translate integers to datetime locally
remote_data.coords["time"] = pd.date_range(f"{years[0]}-{months[0]:02}", f"{years[-1]}-{months[-1]:02}", freq="1H", inclusive="left")
remote_data.sel(latitude=y, longitude=x, time=time)

This data get's read into a Cutout, right?

In an ideal world we support both:

  • Automatically retrieving of the data
  • Building from a local downloaded file

If it is easily possible, then yes, please implement automatic retrieval as well. Since you already have building from a local file, you could implement it by downloading to a local temporary file and then pass that to your existing function.

One question: The naming "wecgenerator" strikes me a bit odd. I haven't looked at it in detail, although my understanding is that in this PR the conversion is from wave energy to electricity. Wouldn't then the term "wec" or "waveenergyconverter" be more appropriate, as not only the "generator", but the whole system is modelled?

euronion avatar Nov 13 '25 08:11 euronion

@lmezilis I notice in the ECHOWAVE data that the variable tp doesn't exist; only t01 and t02 exist linked to wave period data (both being mean wave periods). How did you get tp? From a different version of this dataset?

brynpickering avatar Nov 13 '25 12:11 brynpickering

Building from a local downloaded file

@euronion is this how it is done for other datasets?

brynpickering avatar Nov 13 '25 12:11 brynpickering

Building from a local downloaded file

@euronion is this how it is done for other datasets?

  • For SARAH2/SARAH3, yes. Initially there was not API that's why we implemented it that way. Today there is an API #447 .
  • For ERA5, yes, internally it downloads and creates files, then builds the cutout from those files.

euronion avatar Nov 13 '25 12:11 euronion

the variable tp doesn't exist; only t01 and t02 exist linked to wave period data (both being mean wave periods). How did you get tp? From a different version of this dataset?

@brynpickering I must have changed the variables to be more convenient to work back in the day and completely forgot their original names. t01 is the one that should be used. That dataset has a lot of different types of ocean variables.

In the meantime, it would be great to add some documentation.

of course, sorry for the delays, I am working on this, I have my documentation ready but trying to make it apealing and clear. probably be ready by tomorrow

compare the results for a specific gridcell?

I assume between ERA5 and ECHOWAVE, correct?

lmezilis avatar Nov 13 '25 13:11 lmezilis

One question: The naming "wecgenerator" strikes me a bit odd.

@euronion you are right, a simple wec should do it. I will update this. Also in pypsa-eur I used the term wec_type as a generator index in build_renewables. I notice that wind turbines are called turbines. Should I just type wec there too?

lmezilis avatar Nov 14 '25 10:11 lmezilis

One question: The naming "wecgenerator" strikes me a bit odd.

@euronion you are right, a simple wec should do it. I will update this. Also in pypsa-eur I used the term wec_type as a generator index in build_renewables. I notice that wind turbines are called turbines. Should I just type wec there too?

Yes, wec or even just converter instead of wec_type in this case is fine, because the context and the hierarchy in the config making it clear what it is about (solar -> panel, wind -> turbine, wave -> wec/converter).

euronion avatar Nov 14 '25 10:11 euronion

I have tried to implement the new convert functions and I think they work fine. However it seems like there are issues with the downloaded data, when I try to create the cutout, probably a problem of the OPENDAP server. I have an invalid ID error:

Exception ignored in: <function CachingFileManager.__del__ at 0x000001F1C57E0E00>
Traceback (most recent call last):
  File "c:\Users\thira\anaconda3\envs\pypsa-eur\Lib\site-packages\xarray\backends\file_manager.py", line 250, in __del__
    self.close(needs_lock=False)
  File "c:\Users\thira\anaconda3\envs\pypsa-eur\Lib\site-packages\xarray\backends\file_manager.py", line 234, in close
    file.close()
  File "src/netCDF4/_netCDF4.pyx", line 2669, in netCDF4._netCDF4.Dataset.close
  File "src/netCDF4/_netCDF4.pyx", line 2636, in netCDF4._netCDF4.Dataset._close
  File "src/netCDF4/_netCDF4.pyx", line 2164, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Not a valid ID
C:\Users\thira\Desktop\atlite-mrel\atlite\data.py:249: UserWarning: The specified chunks separate the stored chunks along dimension "time" starting at index 100. This could degrade performance. Instead, consider rechunking after loading.
  cutout.data = xr.open_dataset(cutout.path, chunks=cutout.chunks)

Even with this error the cutout is seemingly finnished, but something is not passed correctly, as when I slice the cutout like so time=slice("2018-01-01", "2018-01-08"), I still get the entire month of cutout, let's say TU-MREL_EU_ATL-2M_201801.nc.

When I set it until February (time=slice("2018-01-01", "2018-02-08")) then the cutout completes with the last timestamps having empty grids. I don't know if this is a cutout.py related issue but I cannot create a cutout smaller than 1 month.

Apart from that the new code looks to be working.

lmezilis avatar Nov 18 '25 13:11 lmezilis

I have also tried to automate the cutout process in case the data are not already downloaded. It seems that there are permision issues there, as I can remotely load the dataset but i cannot load any of the variables, it has to be downloaded. So what I did is a function to create the urls and a another one to download and merge them.

I have to be honest it doesn't look ideal, and also I am not sure how to use the temp directories to save the downloaded files and load them from there. I will upload an example without this feature.

lmezilis avatar Nov 18 '25 14:11 lmezilis

@lmezilis could you point me to the source of the datasets you're using? I used ones directly from the TUDelft OpenDAP but they may be slightly different to the one you have already downloaded.

brynpickering avatar Nov 18 '25 15:11 brynpickering

I am working with the source that you mentioned above: OPeNDAP. The same dataset was used for our calculations.

You can see the code below for how I obtained the urls after the cutout parameters were set:

time_index = cutout.coords["time"].to_index()

urls = []

for year in time_index.year.unique():
    year_times = time_index[time_index.year == year]
    months = year_times.month.unique()

    # Limit months in the final year
    if year == time_index[-1].year:
        last_month = time_index[-1].month
        months = months[months <= last_month]

    for month in months:
        url = (
            "https://opendap.4tu.nl/thredds/dodsC/data2/djht/f359cd0f-d135-416c-9118-e79dccba57b9/1/"
            f"{year}/TU-MREL_EU_ATL-2M_{year}{month:02}.nc",
        )
        urls.append((year, month, url))

lmezilis avatar Nov 18 '25 15:11 lmezilis

I made all of these similar commits because there are some things that I need to change in the syntax, but the pre-commit auto fix changes them back. I don't know why.

lmezilis avatar Nov 20 '25 17:11 lmezilis

@lmezilis no worries. We'll probably squash all these commits when we merge it in, so it'll all be cleaned up.

You could install pre-commit locally so the fixes are managed locally. In your atlite working environment call pre-commit install. Then pre-commit will fix things before you try to commit.

RE allowing data downloads, I've found that the OpenDAP fails when trying to download more than a few MB of data at once (DAP failure or Authorization failure). Not sure if you get this issue @lmezilis but it seems to me that it's too volatile to rely upon as a way to access the data.

brynpickering avatar Nov 24 '25 11:11 brynpickering

Yes I had the same problem the last few days even though last week I could complete it. I say for now lets keep it manual, and I will contact the server to see what we can do.

lmezilis avatar Nov 25 '25 09:11 lmezilis