cdsapi icon indicating copy to clipboard operation
cdsapi copied to clipboard

How to get the latest date available for the dataset?

Open ghost opened this issue 11 months ago • 7 comments

Is your feature request related to a problem? Please describe.

When attempting to automatically download data (e.g., ERA5) using the cdsapi, I consistently find it necessary to implement a try...except block to handle potential program failures when the data exceeds the latest available date. Nevertheless, I can only retrieve this information from error logging. Is it feasible to provide an interface that allows me to determine when to stop the download process?

Describe the solution you'd like

implement an API that displays the latest available data for the dataset.

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

ghost avatar Jul 31 '23 01:07 ghost

This is similar (in spirit) to my request in #78. I have figured out a partial solution using the requests library:

import requests
r = requests.get("https://cds.climate.copernicus.eu/api/v2.ui/resources/reanalysis-era5-single-levels-monthly-means")
print(r.json()['update_date'])

where reanalysis-era5-single-levels-monthly-means was my dataset I was using.

WeatherGod avatar Aug 07 '23 19:08 WeatherGod

There might be other information in that json object that might provide a date range as that "update_date" is specifically about when the dataset was last updated, rather than the latest date in the dataset. Being able to access all of this information (and more!) from within the cdsapi would be very valuable, I think.

WeatherGod avatar Aug 07 '23 19:08 WeatherGod

Just came across this gem in an exception traceback (the exception is actually from the server and reported as part of an error message). I had accidentally requested a date in the future for a dataset.

...
2023-08-15 12:34:00,473 ERROR     File "/home/cds/cdsservices/services/mars/preprocess_request.py", line 172, in implement_embargo
2023-08-15 12:34:00,473 ERROR       f"{embargo_datetime.strftime(embargo_error_time_format)}", ""
2023-08-15 12:34:00,473 ERROR   cdsinf.exceptions.BadRequestException: None of the data you have requested is available yet, please revise the period requested. The latest date available for this dataset is: 2023-08-10 16:00

So, it is definitely theoretically possible to retrieve...

WeatherGod avatar Aug 15 '23 17:08 WeatherGod

This is similar (in spirit) to my request in #78. I have figured out a partial solution using the requests library:

import requests
r = requests.get("https://cds.climate.copernicus.eu/api/v2.ui/resources/reanalysis-era5-single-levels-monthly-means")
print(r.json()['update_date'])

where reanalysis-era5-single-levels-monthly-means was my dataset I was using.

.

This can't be used to every dataset tho, reanalysis-era5-single-levels dataset returns today's date, but the actual update date is 2024-02-13

In [14]: import requests
    ...: r = requests.get("https://cds.climate.copernicus.eu/api/v2.ui/resources/reanalysis-era5-single-levels")
    ...: print(r.json()['update_date'])
2024-02-19

image

luabida avatar Feb 19 '24 17:02 luabida

This can't be used to every dataset tho, reanalysis-era5-single-levels dataset returns today's date, but the actual update date is 2024-02-13

Update date in this case means the date new files are added, not the date those new files are for. From the overview tab:

ERA5 is updated daily with a latency of about 5 days.

zqianem avatar Feb 19 '24 17:02 zqianem

ERA5 is updated daily with a latency of about 5 days.

I've been using 6 days, but it would be helpful to have a way of gettings this last available date programmatically

luabida avatar Feb 19 '24 17:02 luabida