staged-recipes icon indicating copy to clipboard operation
staged-recipes copied to clipboard

[WIP] Add daymet

Open yuvipanda opened this issue 2 years ago • 23 comments

yuvipanda avatar Oct 28 '22 15:10 yuvipanda

hacked from https://github.com/TomAugspurger/daymet-recipe

yuvipanda avatar Oct 28 '22 18:10 yuvipanda

Thanks Yuvi! 🙌

I'm curious to see how this runs with on the Beam branch. The NA-daily job was running really slowly when I tried it with pangeo-forge-recipes main (executed on a Dask Cluster), and I haven't had a chance to investigate why.

TomAugspurger avatar Oct 28 '22 18:10 TomAugspurger

@TomAugspurger yay! Also can I bring you onto the pangeo-forge slack somehow maybe? :)

@TomAugspurger also, i'm curious if we can get this data via https://cmr.earthdata.nasa.gov/search/concepts/C2031536952-ORNL_CLOUD.html instead? Is earthdata login what is holding that back?

yuvipanda avatar Oct 28 '22 19:10 yuvipanda

@TomAugspurger ok, I just tried to get this purely from CMR for the daily run, and running into:


Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 837, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 983, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/Users/yuvipanda/.local/share/virtualenvs/pangeo-forge-runner/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1877, in <lambda>
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/executors/beam.py", line 14, in _no_arg_stage
    fun(config=config)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 530, in prepare_target
    with open_chunk(chunk_key, config=config) as ds:
  File "/srv/conda/envs/notebook/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 414, in open_chunk
    ds = xr.concat(dsets, config.concat_dim, **config.xarray_concat_kwargs)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/concat.py", line 243, in concat
    return _dataset_concat(
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/concat.py", line 558, in _dataset_concat
    raise ValueError(f"{name!r} is not present in all datasets.")
ValueError: 'dayl' is not present in all datasets.

yuvipanda avatar Oct 28 '22 19:10 yuvipanda

Which makes sense, as I think it's one file per region, per year, per variable

yuvipanda avatar Oct 28 '22 19:10 yuvipanda

I'm making this into one recipe per region per variable, chunked across time. From conversations with ORNL folks, it would also be exciting to have this be chunked across lat / lon, so you can get historical info for a single 'pixel' - like https://daymet.ornl.gov/single-pixel/.

In this case, I'd imagine it would be the same recipes as otherwise, but just chunked by lat /lon?

yuvipanda avatar Oct 28 '22 20:10 yuvipanda

Hmm, I should probably fold region inside, and just make one recipe per variable?

yuvipanda avatar Oct 28 '22 20:10 yuvipanda

I swear my commit messages are usually of better quality :) I'll squash and what not before final.

yuvipanda avatar Oct 28 '22 20:10 yuvipanda

Now this PR reads data list via CMR!

yuvipanda avatar Oct 28 '22 20:10 yuvipanda

I think maybe the workers ran out of memory here? I bumped up the size of the node being used in dataflow from n1-highmem-2 to 8

yuvipanda avatar Oct 28 '22 21:10 yuvipanda

I see a lot of this in the logs:

[2022-10-28T21:44:17.124660+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/indexing.py:1380: PerformanceWarning: Slicing is producing a large chunk. To accept the large
[2022-10-28T21:44:17.124708+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] chunk and silence this warning, set the option
[2022-10-28T21:44:17.124716+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187]     >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
[2022-10-28T21:44:17.124722+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187]     ...     array[indexer]
[2022-10-28T21:44:17.124727+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] 
[2022-10-28T21:44:17.124732+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] To avoid creating the large chunks, set the option
[2022-10-28T21:44:17.124737+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187]     >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
[2022-10-28T21:44:17.124742+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187]     ...     array[indexer]
[2022-10-28T21:44:17.124750+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187]   value = value[(slice(None),) * axis + (subkey,)]
[2

Given that this is happening at same time as:

[2022-10-28T21:44:17.746232+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening input with Xarray Index({DimIndex(name='time', index=121, sequence_len=156, operation=<CombineOp.CONCAT: 2>)}): 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_na_dayl_2010.nc'
[2022-10-28T21:44:17.746490+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_na_dayl_2010.nc' from cache
[2022-10-28T21:44:19.788892+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening input with Xarray Index({DimIndex(name='time', index=122, sequence_len=156, operation=<CombineOp.CONCAT: 2>)}): 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_hi_dayl_2010.nc'
[2022-10-28T21:44:19.789208+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_hi_dayl_2010.nc' from cache
[2022-10-28T21:44:20.046337+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening input with Xarray Index({DimIndex(name='time', index=123, sequence_len=156, operation=<CombineOp.CONCAT: 2>)}): 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_hi_dayl_2011.nc'
[2022-10-28T21:44:20.046594+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_hi_dayl_2011.nc' from cache

I suspected this was maybe because the source nc file was too large, but it is barely a meg. Maybe the destination is too large.

This is the first recipe I'm really writing so a lot of learning!

yuvipanda avatar Oct 28 '22 21:10 yuvipanda

Yep, definitely running out of memory:

[2022-10-28T21:55:42.920516+00:00] [system:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Out of memory: Killed process 6534 (python) total-vm:102006872kB, anon-rss:45809068kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:90608kB oom_score_adj:900

Not exactly sure why, probably something about the chunking.

yuvipanda avatar Oct 28 '22 21:10 yuvipanda

oooh, better - currently getting:

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
    self.stage.function(arg, config=self.config)  # type: ignore
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 645, in store_chunk
    zarr_array[zarr_region] = data
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py", line 1373, in __setitem__
    self.set_basic_selection(pure_selection, value, fields=fields)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py", line 1468, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py", line 1772, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py", line 1800, in _set_selection
    check_array_shape('value', value, sel_shape)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/util.py", line 547, in check_array_shape
    raise ValueError('parameter {!r}: expected array with shape {!r}, got {!r}'
ValueError: parameter 'value': expected array with shape (365, 231, 364), got (365, 584, 284) [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/store_chunk/Execute-ptransform-56']

yuvipanda avatar Oct 29 '22 02:10 yuvipanda

Now with:

29T03:18:41.116192+00:00] [worker:daymet-daily-8efc9a7712d3-10282000-lymd-harness-h9tm] Operation ongoing for over 771.76 seconds in state process-msecs in step Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56  without returning. Current Traceback:
  File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 937, in _bootstrap
    self._bootstrap_inner()

  File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/utils/thread_pool_executor.py", line 53, in run
    self._work_item.run()

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/utils/thread_pool_executor.py", line 37, in run
    self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 356, in task
    self._execute(

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
    response = task()

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
    lambda: self.create_worker().do_instruction(request), request)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
    return getattr(self, request_type)(

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
    bundle_processor.process_bundle(instruction_id))

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
    input_op_by_transform_id[element.transform_id].process_encoded(

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
    self.output(decoded_value)

  File "/Users/yuvipanda/.local/share/virtualenvs/pangeo-forge-runner/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1956, in <lambda>

  File "/Users/yuvipanda/code/pangeo-forge-recipes/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
    config.storage_config.cache.cache_file(

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
    _copy_btw_filesystems(input_opener, target_opener)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
    data = source.read(BLOCK_SIZE)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
    return super().read(length)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
    out = self.cache._fetch(self.loc, self.loc + length)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
    self.cache = self.fetcher(start, bend)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
    return sync(self.loop, func, *args, **kwargs)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 84, in sync
    if event.wait(1):

  File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 581, in wait
    signaled = self._cond.wait(timeout)

  File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 316, in wait
    gotit = waiter.acquire(True, timeout)
[2022-10-29T03:18:41.158921+00:00] [worker:daymet-daily-8efc9a7712d3-10282000-lymd-harness-h9tm] Operation ongoing for over 668.51 seconds in state process-msecs in step Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56  without returning. Current Traceback:
  File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 937, in _bootstrap
    self._bootstrap_inner()

  File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/utils/thread_pool_executor.py", line 53, in run
    self._work_item.run()

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/utils/thread_pool_executor.py", line 37, in run
    self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 356, in task
    self._execute(

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
    response = task()

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
    lambda: self.create_worker().do_instruction(request), request)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
    return getattr(self, request_type)(

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
    bundle_processor.process_bundle(instruction_id))

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
    input_op_by_transform_id[element.transform_id].process_encoded(

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
    self.output(decoded_value)

  File "/Users/yuvipanda/.local/share/virtualenvs/pangeo-forge-runner/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1956, in <lambda>

  File "/Users/yuvipanda/code/pangeo-forge-recipes/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
    config.storage_config.cache.cache_file(

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
    _copy_btw_filesystems(input_opener, target_opener)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
    data = source.read(BLOCK_SIZE)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
    return super().read(length)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
    out = self.cache._fetch(self.loc, self.loc + length)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
    self.cache = self.fetcher(start, bend)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
    return sync(self.loop, func, *args, **kwargs)

  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 84, in sync
    if event.wait(1):

  File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 581, in wait
    signaled = self._cond.wait(timeout)

  File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 316, in wait
    gotit = waiter.acquire(True, timeout)

yuvipanda avatar Oct 29 '22 03:10 yuvipanda

Hi @yuvipanda, quick question about this recipe. I saw that you're specifying aiohttp.BasicAuth:

https://github.com/yuvipanda/staged-recipes/blob/5a50ed2ed1212e77a7615cbbbc285c7163dce465/recipes/daymet/recipe.py#L15

client_kwargs = {
    'auth': aiohttp.BasicAuth(username, password),
    'trust_env': True,
}

...

            fsspec_open_kwargs=dict(
                client_kwargs=client_kwargs
            ),

I'm wanted to check whether this was successfully serialized when generating the recipe hash? (https://github.com/pangeo-forge/pangeo-forge-recipes/pull/429).

@andersy005 encountered a problem where aiohttp client_kwargs couldn't be serialized in a feedstock recipe run, a couple of workarounds are suggested here: https://github.com/pangeo-forge/eooffshore_ics_ccmp_v02_1_nrt_wind-feedstock/pull/3#discussion_r1010420618

I'd have thought that there'd be a similar issue with BasicAuth, I've replicated it locally (see pangeo_forge_recipes.serialization) and it doesn't seem to be serializable:

In [4]: import inspect
   ...: from collections.abc import Collection
   ...: from dataclasses import asdict
   ...: from enum import Enum
   ...: from hashlib import sha256
   ...: from json import dumps
   ...: from typing import Any, List, Sequence

In [5]: def either_encode_or_hash(obj: Any):
   ...:     """For objects which are not serializable with ``json.dumps``, this function defines
   ...:     type-specific handlers which extract either a serializable value or a hash from the object.
   ...:     :param obj: Any object which is not serializable to ``json``.
   ...:     """
   ...: 
   ...:     if isinstance(obj, Enum):  # custom serializer for FileType, CombineOp, etc.
   ...:         return obj.value
   ...:     elif hasattr(obj, "sha256"):
   ...:         return obj.sha256.hex()
   ...:     elif inspect.isfunction(obj):
   ...:         return inspect.getsource(obj)
   ...:     elif isinstance(obj, bytes):
   ...:         return obj.hex()
   ...:     raise TypeError(f"object of type {type(obj).__name__} not serializable")

In [11]: ba = aiohttp.BasicAuth(login="test", password="test")

In [12]: either_encode_or_hash(ba)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 either_encode_or_hash(ba)

Input In [5], in either_encode_or_hash(obj)
     13 elif isinstance(obj, bytes):
     14     return obj.hex()
---> 15 raise TypeError(f"object of type {type(obj).__name__} not serializable")

TypeError: object of type BasicAuth not serializable

derekocallaghan avatar Nov 08 '22 13:11 derekocallaghan

@derekocallaghan yeah, am definitely running into that too! However, this isn't an actual serialization issue but something to do with the hashing (which IMO should probably just be removed?). To unblock myself temporarily while I ramp up on writing more recipes, I've applied the following patch to pangeo-forge-recipes :D

diff --git a/pangeo_forge_recipes/serialization.py b/pangeo_forge_recipes/serialization.py
index c7fb42a..d2c8828 100644
--- a/pangeo_forge_recipes/serialization.py
+++ b/pangeo_forge_recipes/serialization.py
@@ -22,7 +22,8 @@ def either_encode_or_hash(obj: Any):
         return inspect.getsource(obj)
     elif isinstance(obj, bytes):
         return obj.hex()
-    raise TypeError(f"object of type {type(obj).__name__} not serializable")
+    return bytes.fromhex("6ac3c336e4094835293a3fed8a4b5fedde1b5e2626d9838fed50693bba00af0e")
+    # raise TypeError(f"object of type {type(obj).__name__} not serializable")
 
 
 def dict_to_sha256(dictionary: dict) -> bytes:

yuvipanda avatar Nov 09 '22 08:11 yuvipanda

Hi @yuvipanda, yep, the hashing in serialization.py has raised some issues recently. It may be the case that we can exclude client_kwargs or even fsspec_open_kwargs from the hashing (see below), as I guess they could contain arbitrary values/objects. E.g. this possible workaround

FilePattern has its own sha256() implementation, and although fsspec_open_kwargs (and client_kwargs) is currently included, perhaps this could be excluded (similar to the pattern format_function and combine dims currently excluded), or particular kwargs (e.g. any aiohttp instances) could be excluded from fsspec_open_kwargs prior to serialization. I guess timeouts/credentials values are something that aren't necessary for a recipe hash?

https://github.com/pangeo-forge/pangeo-forge-recipes/blob/3441d94a290d296b8f62638df15bb60993e86b1d/pangeo_forge_recipes/patterns.py#L299

    # we exclude the format function and combine dims from ``root`` because they determine the
    # index:filepath pairs yielded by iterating over ``.items()``. if these pairs are generated in
    # a different way in the future, we ultimately don't care.
    root = {
        "fsspec_open_kwargs": pattern.fsspec_open_kwargs,
        "query_string_secrets": pattern.query_string_secrets,
        "file_type": pattern.file_type,
        "nitems_per_file": {
            op.name: op.nitems_per_file  # type: ignore
            for op in pattern.combine_dims
            if op.name in pattern.concat_dims
        },
    }

derekocallaghan avatar Nov 09 '22 09:11 derekocallaghan

@derekocallaghan I think the recipe hash should be an allow_list, including only specific things it wants to track, rather than exclude specific things. I am not exactly sure what this hash is actually used for right now, do you know?

yuvipanda avatar Nov 09 '22 09:11 yuvipanda

There are two separate serialization issues here though - one is related to beam serialization, and one is related to hashing to get a hash id for the recipe. They probably both need different solutions as well

yuvipanda avatar Nov 09 '22 09:11 yuvipanda

Yeah, previously the hash was created on demand, where BaseRecipe.sha256() was called from BaseRecipe.get_execution_context(), which itself was called from XarrayZarrRecipe.prepare_target(), and generating a default job name in pangeo_forge_runner.commands.bake.Bake. I couldn't see where it was used apart from that at the time.

Agree that an allow_list is preferable.

With your hash workaround above, does the subsequent Beam-related pickling/serialization work?

derekocallaghan avatar Nov 09 '22 09:11 derekocallaghan

@derekocallaghan I think so. I'm running it with local direct runner and was able to generate a full series of one particular variable just for HI!

image

I've just pushed my latest changes. I'm trying to get a couple steps running for all of the regions and variables.

I'm producing one recipe per variable per region, partially to try see if I can get that to work before trying to merge the variables into one dimension. @jbusecke made me realize we can't actually easily combine the three regions into one!

yuvipanda avatar Nov 09 '22 09:11 yuvipanda

I'm testing this locally the following way:

  1. Make a local_runner_config.py file with the following:
import pathlib

HERE = pathlib.Path(__file__).parent
c.TargetStorage.fsspec_class = "fsspec.implementations.local.LocalFileSystem"
c.TargetStorage.root_path = f"file://{HERE}/storage/output/{{job_id}}"

c.InputCacheStorage.root_path = f"file://{HERE}/storage/cache"
c.InputCacheStorage.fsspec_class = c.TargetStorage.fsspec_class

c.MetadataCacheStorage.root_path = f"file://{HERE}/storage/metadata/{{job_id}}"
c.MetadataCacheStorage.fsspec_class = c.TargetStorage.fsspec_class

c.Bake.bakery_class = "pangeo_forge_runner.bakery.local.LocalDirectBakery"
  1. Install pangeo-forge-runner from master
  2. Install pangeo-forge-recipes from master, with the patch indicated.
  3. Run it from this repo with: pangeo-forge-runner bake --repo . --feedstock-subdir=recipes/daymet -f local_runner_config.py --prune

yuvipanda avatar Nov 09 '22 09:11 yuvipanda

Currently it fails with the following when trying to run the NA files:

Worker: severity: WARN timestamp {   seconds: 1667987210   nanos: 63492059 } message: "Variable tmin of 92123153000 bytes is 184.25 times larger than specified maximum variable array size of 500000000 bytes. Consider re-instantiating recipe with `subset_inputs = {\"time\": 185}`. If `len(ds[\"time\"])` < 185, substitute \"time\" for any name in ds[\"tmin\"].dims with length >= 185 or consider subsetting along multiple dimensions. Setting PANGEO_FORGE_MAX_MEMORY env variable changes the variable array size which will trigger this warning." instruction_id: "bundle_1608" transform_id: "Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/store_chunk/Execute" log_location: "/Users/yuvipanda/code/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:621" thread: "Thread-14" 

yuvipanda avatar Nov 09 '22 10:11 yuvipanda