staged-recipes
staged-recipes copied to clipboard
Adding EOOffshore CMEMS ASCAT REP and NRT recipes
These recipes will create two data sets containing Copernicus Marine Service (CMS), or Copernicus Marine Environment Monitoring Service (CMEMS) Reprocessed (2007 - 2021) and Near Real Time (2016 - 2021) Metop-A/B/C ASCAT Ascending/Descending products, for the Irish Continental Shelf (ICS) region, where the original time coordinates have been replaced with the satellite/pass measurement_time values. Products are retrieved from the following source data sets:
- Global Ocean Daily Gridded Reprocessed (REP) Level-3 Sea Surface Winds from Scatterometer
- Global Ocean Daily Gridded Near Real Time (NRT) Level-3 Sea Surface Winds from Scatterometer
The recipes will recreate the ASCAT data sets used in the EOOffshore project (https://eooffshore.github.io), whose outputs were presented (Scalable Offshore Wind Analysis With Pangeo) at the Meeting Exascale Computing Challenges with Compression and Pangeo 2022 EGU General Assembly session.
Example usage of the ASCAT data sets in EOOffshore:
- ASCAT Wind Data for Irish Continental Shelf region
- Offshore Wind in Irish Areas Of Interest
- Comparison of Offshore Wind Speed Extrapolation and Power Density Estimation
Note:
- Pruned versions of both recipes have been successfully tested using a local installation of
pangeo_forge_recipes
, version0.9.1
. - Products are retrieved using FTP, where the credentials specified here are used.
- The
time
coordinates provided by ASCAT products are replaced with a corresponding ICS spatial mean of themeasurement_time
variable. However, this means that the file pattern URLs need to be sorted prior to subseqent Zarr store creation. This has been achieved by subclassingXarrayZarrRecipe
and replacing bothself.file_pattern
andself._compiler
, where the latter function inserts an additionalStage
that performs the sort operation into thePipeline
.
/run eooffshore_ics_cmems_WIND_GLO_WIND_L3_REP_OBSERVATIONS_012_005_MetOp_ASCAT
/run eooffshore_ics_cmems_WIND_GLO_WIND_L3_NRT_OBSERVATIONS_012_002_MetOp_ASCAT
I noticed the /run
calls above errored on the backend. Here are some notes on why, and what we can do about it:
- The error was:
File "/pangeo_forge_recipes/serialization.py", line 20, in either_encode_or_hash raise TypeError(f"object of type {type(obj).__name__} not serializable"
- This is because the backend is currently running
pangeo-forge-recipes==0.9.0
, which does not have this function serialization code. (Themeta.yaml
correctly specifies a requirement for0.9.1
, but the ability to dynamically use thatpangeo-forge-recipes
version on the backend does not yet exist.) - As a first step then, we know we need to upgrade to
0.9.1
on the backend. - But once that's complete, we'll immediately hit another issue. The latest release of
pangeo-forge-runner
, version0.5
, which the backend uses for deploying jobs, evaluates recipe code in such a way thatinspect.get_source
does not work, and will raiseOSError: could not get source code
. I know @yuvipanda is aware of this (we were discussing it last week), but I'm not clear if this issue is fixed in an unreleased version ofpangeo-forge-runner
, or if it remains unresolved.
So, to recap, we need to:
- [x] Upgrade backend to
0.9.1
- [x] Release a version of
pangeo-forge-runner
which can callinspect.get_source
without erroring. (Either by simply releasing from the currentmain
, if it's fixed there, or making a PR, and then releasing.)
and then we can re-run these tests.
Thanks for the update @cisaacstern. Should I look into getting the recipe to work with a local copy of 0.9.0
, or is it best to stick with 0.9.1
and wait until after the upgrade/release?
@derekocallaghan, let's stick with 0.9.1
and wait until upgrade/release to re-run.
I've opened https://github.com/pangeo-forge/pangeo-forge-runner/issues/31 to track adding tests to pangeo-forge-runner
for the blocking issue here.
pre-commit.ci autofix
Hi @cisaacstern, it looks like the pre-commit autofix didn't work as the PR was made from an organization, similar to https://github.com/pangeo-forge/staged-recipes/pull/145#issuecomment-1262695137
Hi @cisaacstern, I don't think these recipes have been run yet, but I wanted to confirm that all imports required by the process_input
function are in the function body, similar to this PR recipe: https://github.com/pangeo-forge/staged-recipes/pull/145#issuecomment-1262554523
It looks like the current process_ds()
function should be okay. My only question is whether the function type hints will cause any problems? I can remove these if required, thanks.
def process_ds(
ds: xr.Dataset, fname: str, satellites: List[str], service: str
) -> xr.Dataset:
@derekocallaghan, thanks for checking in. Since we last looked at this, the two TODO items in https://github.com/pangeo-forge/staged-recipes/pull/183#issuecomment-1251307499 are now complete. I'll take a moment here to explain how I confirmed that, in the event it's useful to @andersy005 or other admins to see the thought process. Then after this comment, I'll trigger tests of your recipes here. The TODO items were:
-
Upgrade backend to
0.9.1
. To confirm this, I opened a shell on our production server, and ran:Note: Only Pangeo Forge Cloud admins will be able to open a shell on the production server like this.
$ heroku run /bin/bash -a pangeo-forge-api-prod Running /bin/bash on ⬢ pangeo-forge-api-prod... up, run.1629 (Hobby) ~ $ pip show pangeo-forge-recipes Name: pangeo-forge-recipes Version: 0.9.1 Summary: Pipeline tools for building and publishing analysis ready datasets. Home-page: https://github.com/pangeo-forge/pangeo-forge-recipes Author: Author-email: License: Apache Location: /usr/local/lib/python3.9/dist-packages Requires: dask, distributed, fsspec, h5netcdf, h5py, intake, intake-xarray, kerchunk, mypy-extensions, netcdf4, numcodecs, setuptools, xarray, zarr Required-by: pangeo-forge-runner
-
Release a version of pangeo-forge-runner which can call inspect.get_source without erroring. To test this, I ran the latest release of
pangeo-forge-runner==0.6.1
from my local terminal:
The fact that this call did not error tells us that this problem is resolved in$ pangeo-forge-runner expand-meta --repo=https://github.com/eooffshore/staged-recipes --ref=d33441bfa8d3d950a78b0deea02d9da982a3d01a --json --feedstock-subdir=recipes/eooffshore_ics_cms_WIND_GLO_WIND_L3_OBSERVATIONS_Metop_ASCAT {"message": "Picked Git content provider.\n", "status": "fetching"} {"message": "Cloning into '/var/folders/tt/4f941hdn0zq549zdwhcgg98c0000gn/T/tmpr55acjum'...\n", "status": "fetching"} {"message": "HEAD is now at d33441b Fixed typos in metadata (recipe runs are currently postponed as described above)\n", "status": "fetching"} {"message": "Expansion complete", "status": "completed", "meta": {"title": "eooffshore_ics_cms_WIND_GLO_WIND_L3_OBSERVATIONS_Metop_ASCAT", "description": "EOOffshore Project (https://eooffshore.github.io) - Copernicus Marine Service Reprocessed (2007 - 2021) and Near Real Time (2016 - 2021) Metop-A/B/C ASCAT Ascending/Descending products, for the Irish Continental Shelf region, where the original time coordinates have been replaced with the satellite/pass measurement_time values.", "pangeo_forge_version": "0.9.1", "pangeo_notebook_version": "2022.06.02", "recipes": [{"id": "eooffshore_ics_cmems_WIND_GLO_WIND_L3_REP_OBSERVATIONS_012_005_Metop_ASCAT", "object": "recipe:rep_recipe"}, {"id": "eooffshore_ics_cmems_WIND_GLO_WIND_L3_NRT_OBSERVATIONS_012_002_Metop_ASCAT", "object": "recipe:nrt_recipe"}], "provenance": {"providers": [{"name": "Copernicus Marine Service (CMS)", "description": "The Copernicus Marine Service (CMS), or Copernicus Marine Environment Monitoring Service (CMEMS), is the marine component of the European Union Copernicus Earth Observation (EO) programme, providing free, regular and systematic ocean data products on a global and regional scale.", "roles": ["producer", "licensor"], "url": "https://marine.copernicus.eu/"}], "license": "Generated using E.U. Copernicus Marine Service Information; https://doi.org/10.48670/moi-00182; https://doi.org/10.48670/moi-00183;"}, "maintainers": [{"name": "Derek O'Callaghan", "orcid": "0000-0003-0585-8631", "github": "derekocallaghan"}], "bakery": {"id": "pangeo-ldeo-nsf-earthcube"}}}
pangeo-forge-runner==0.6.1
, which is the version on the latest deployment of the backend server.
/run eooffshore_ics_cmems_WIND_GLO_WIND_L3_REP_OBSERVATIONS_012_005_Metop_ASCAT
/run eooffshore_ics_cmems_WIND_GLO_WIND_L3_NRT_OBSERVATIONS_012_002_Metop_ASCAT
Thanks for the update @cisaacstern, and for triggering the recipe tests. I see that they were started and have been running for a few hours now. I'm assuming that there's a problem with the recipes as the pruned version normally runs in a couple of minutes locally. I don't see any logs, would you be able to check on your side? Thanks.
- https://pangeo-forge.org/dashboard/recipe-run/1094?feedstock_id=1
- https://pangeo-forge.org/dashboard/recipe-run/1095?feedstock_id=1
Thanks for the update @cisaacstern, and for triggering the recipe tests. I see that they were started and have been running for a few hours now. I'm assuming that there's a problem with the recipes as the pruned version normally runs in a couple of minutes locally. I don't see any logs, would you be able to check on your side? Thanks.
* https://pangeo-forge.org/dashboard/recipe-run/1094?feedstock_id=1 * https://pangeo-forge.org/dashboard/recipe-run/1095?feedstock_id=1
Hi @cisaacstern, I just saw your reply regarding logs to a separate PR, so no problem if they're not readily available for these ASCAT recipes, thanks.
@derekocallaghan thanks for checking in. I've taken a look on the backend and I actually do not see these jobs registered on Google Cloud Dataflow. This tells me there was likely some problem with the recipe parsing and/or job submission process. As it happens, I am beginning a roughly monthlong family leave tomorrow, however it's possible that another maintainer will be able to help debug this in my absence, in which case my recommendation would be for a maintainer to re-trigger these test runs while tracking the backend server logs, as described in:
https://github.com/pangeo-forge/pangeo-forge-orchestrator/tree/main/docs#heroku-server-logs
This may provide some hint of the problem. If the problem is at the pangeo-forge-runner
layer, a further debugging step, as described in https://github.com/pangeo-forge/pangeo-forge-orchestrator/issues/148#issue-1390156166 may be necessary.
Hi @yuvipanda - any chance you might be able to help us debug this recipe run in Charles' absence?
I've been trying to run these recipes locally with Beam (and looking at pangeo_forge_runner.feedstock.Feedstock._import()
), and it looks like the issues are being caused by the use of class AscatXarrayZarrRecipe(XarrayZarrRecipe)
. Subsequent pickling of the recipe objects results in the following being raised by pickle.save_global()
:
_pickle.PicklingError: Can't pickle <class 'AscatXarrayZarrRecipe'>: it's not found as builtins.AscatXarrayZarrRecipe
I'll look at refactoring the recipes so that the required operations are performed on XarrayZarrRecipe
instances instead.