open-meteo icon indicating copy to clipboard operation
open-meteo copied to clipboard

Coordinate information of resulting cell when using `cell_selection= ...` and clipped areas

Open fzeiser opened this issue 11 months ago • 3 comments

I'm trying to get consistent information on what the source cell for a (historical) weather data request is, but I have problems to make sense of the answer. I'll prepare a minimal example below.

Reading following post https://openmeteo.substack.com/p/60-years-of-historical-weather-as my understanding is that the ERA5 data is clipped to "close-to-land". Thus the coordinate below, which is mid in the Atlantic, shouldn't have any data close-by (https://www.openstreetmap.org/search?whereami=1&query=48.00%2C-33.40#map=4/47.99/-33.35). However, from the response, it seems like the selected grid-cell is mid in the ocean - where I assume that there is not available data in open-meteo.

Assumptions:

ERA5 data at open-meteo is clipped to "close to land", thus should not contain any data close to 48.0°N -33.4°E.

Minimal example

import openmeteo_requests

# Setup the Open-Meteo API client with cache and retry on error
openmeteo = openmeteo_requests.Client()

# Make sure all required weather variables are listed here
# The order of variables in hourly or daily is important to assign them correctly below
url = "https://archive-api.open-meteo.com/v1/archive"
params = {
    "latitude": [48.00],
    "longitude": [-33.40],
    "start_date": "2024-03-03",
    "end_date": "2024-03-17",
    "hourly": "wind_speed_100m",
    "wind_speed_unit": "ms",
    "models": "era5",
    "cell_selection": "land" # Alternative: use 'nearest'
}
responses = openmeteo.weather_api(url, params=params)

# Process first location. Add a for-loop for multiple locations or weather models
response = responses[0]
print(f"Coordinates {response.Latitude()}°N {response.Longitude()}°E")

>

Result:

Coordinates 48.0°N -33.5°E

Expected result

I would like to get mid-point of the grid-cell in ERA5 that contains the data provided in the response.

  • For "cell_selection": "land", some coordinate that is on land, maybe 49.272,-53.960, or whatever the closest cell on land is.
  • For "cell_selection": "nearest", the closest cell that has data.

If you have in the mean time collected all ERA5 data and do not perform clipping any longer it would be interesting to know. Still, I would expect a different selected lat/lon coordinate for "cell_selection": "land".

fzeiser avatar Mar 19 '24 15:03 fzeiser

Hi, thanks for the report! "cell_selection": "land" is only a preference to select a grid-cell with land-properties if available in the surrounding. In the middle of the ocean and if no land grid-cell is available, it will fall back to the nearest grid-cell.

Selecting the nearest land-grid cell, regardless of distance, will be computationally expensive as all grid-cells need to be search to find the closest. I am also not sure, if it makes sense to return a grid-cell that is hundredths of kilometres away.

Note: The blog article that showed a reduction in size by removing grid-cells in the ocean is a bit outdated. By now, the entire ERA5 grid is available. The size for historical ERA5, ERA5-Land, CERRA and ECMWF-IFS at 9km is roughly 16 TB now.

patrick-zippenfenig avatar Mar 21 '24 10:03 patrick-zippenfenig

Thank you for the additional information. Two followup questions / comments:

  • cell-seletion: land -- I totally understand that this is computationally heavy. If it is not feasible to calculate, I would want the api specification to state that a land cell is selected if there is a land cell within x km distance.
  • Very good news that all of ERA5 is available now :). You are really doing an amazing job! What would the error message be if I request data for a location that you do not have data for? Alternatively: How would I discover? Would the check I have done above reveal that I get data for a cell far away?

fzeiser avatar Apr 02 '24 14:04 fzeiser

cell-seletion: land -- I totally understand that this is computationally heavy. If it is not feasible to calculate, I would want the api specification to state that a land cell is selected if there is a land cell within x km distance.

I created an issue ticket to better document the cell_selection parameter

Very good news that all of ERA5 is available now :). You are really doing an amazing job! What would the error message be if I request data for a location that you do not have data for? Alternatively: How would I discover? Would the check I have done above reveal that I get data for a cell far away?

Data returned will be an arrays of NaN. The API should always produce an expected structure (length of array given a time range or the same number of weather variables). I beliefe there are some edge cases e.g. an empty response could be returned if a local domain like CERRA is requested for a coordinate outside the grid.

patrick-zippenfenig avatar Apr 03 '24 08:04 patrick-zippenfenig