staged-recipes
staged-recipes copied to clipboard
[WIP] Add daymet
hacked from https://github.com/TomAugspurger/daymet-recipe
Thanks Yuvi! 🙌
I'm curious to see how this runs with on the Beam branch. The NA-daily job was running really slowly when I tried it with pangeo-forge-recipes main (executed on a Dask Cluster), and I haven't had a chance to investigate why.
@TomAugspurger yay! Also can I bring you onto the pangeo-forge slack somehow maybe? :)
@TomAugspurger also, i'm curious if we can get this data via https://cmr.earthdata.nasa.gov/search/concepts/C2031536952-ORNL_CLOUD.html instead? Is earthdata login what is holding that back?
@TomAugspurger ok, I just tried to get this purely from CMR for the daily run, and running into:
Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1417, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 837, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 983, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "/Users/yuvipanda/.local/share/virtualenvs/pangeo-forge-runner/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1877, in <lambda>
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/executors/beam.py", line 14, in _no_arg_stage
fun(config=config)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 530, in prepare_target
with open_chunk(chunk_key, config=config) as ds:
File "/srv/conda/envs/notebook/lib/python3.9/contextlib.py", line 119, in __enter__
return next(self.gen)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 414, in open_chunk
ds = xr.concat(dsets, config.concat_dim, **config.xarray_concat_kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/concat.py", line 243, in concat
return _dataset_concat(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/concat.py", line 558, in _dataset_concat
raise ValueError(f"{name!r} is not present in all datasets.")
ValueError: 'dayl' is not present in all datasets.
Which makes sense, as I think it's one file per region, per year, per variable
I'm making this into one recipe per region per variable, chunked across time. From conversations with ORNL folks, it would also be exciting to have this be chunked across lat / lon, so you can get historical info for a single 'pixel' - like https://daymet.ornl.gov/single-pixel/.
In this case, I'd imagine it would be the same recipes as otherwise, but just chunked by lat /lon?
Hmm, I should probably fold region inside, and just make one recipe per variable?
I swear my commit messages are usually of better quality :) I'll squash and what not before final.
Now this PR reads data list via CMR!
I think maybe the workers ran out of memory here? I bumped up the size of the node being used in dataflow from n1-highmem-2 to 8
I see a lot of this in the logs:
[2022-10-28T21:44:17.124660+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] /srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/indexing.py:1380: PerformanceWarning: Slicing is producing a large chunk. To accept the large
[2022-10-28T21:44:17.124708+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] chunk and silence this warning, set the option
[2022-10-28T21:44:17.124716+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
[2022-10-28T21:44:17.124722+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] ... array[indexer]
[2022-10-28T21:44:17.124727+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187]
[2022-10-28T21:44:17.124732+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] To avoid creating the large chunks, set the option
[2022-10-28T21:44:17.124737+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
[2022-10-28T21:44:17.124742+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] ... array[indexer]
[2022-10-28T21:44:17.124750+00:00] [docker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] value = value[(slice(None),) * axis + (subkey,)]
[2
Given that this is happening at same time as:
[2022-10-28T21:44:17.746232+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening input with Xarray Index({DimIndex(name='time', index=121, sequence_len=156, operation=<CombineOp.CONCAT: 2>)}): 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_na_dayl_2010.nc'
[2022-10-28T21:44:17.746490+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_na_dayl_2010.nc' from cache
[2022-10-28T21:44:19.788892+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening input with Xarray Index({DimIndex(name='time', index=122, sequence_len=156, operation=<CombineOp.CONCAT: 2>)}): 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_hi_dayl_2010.nc'
[2022-10-28T21:44:19.789208+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_hi_dayl_2010.nc' from cache
[2022-10-28T21:44:20.046337+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening input with Xarray Index({DimIndex(name='time', index=123, sequence_len=156, operation=<CombineOp.CONCAT: 2>)}): 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_hi_dayl_2011.nc'
[2022-10-28T21:44:20.046594+00:00] [worker:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Opening 'https://data.ornldaac.earthdata.nasa.gov/protected/daymet/Daymet_Daily_V4/data/daymet_v4_daily_hi_dayl_2011.nc' from cache
I suspected this was maybe because the source nc file was too large, but it is barely a meg. Maybe the destination is too large.
This is the first recipe I'm really writing so a lot of learning!
Yep, definitely running out of memory:
[2022-10-28T21:55:42.920516+00:00] [system:dayl-bba2fd66df972afa41b1-10281438-er4b-harness-6187] Out of memory: Killed process 6534 (python) total-vm:102006872kB, anon-rss:45809068kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:90608kB oom_score_adj:900
Not exactly sure why, probably something about the chunking.
oooh, better - currently getting:
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
self.stage.function(arg, config=self.config) # type: ignore
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 645, in store_chunk
zarr_array[zarr_region] = data
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py", line 1373, in __setitem__
self.set_basic_selection(pure_selection, value, fields=fields)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py", line 1468, in set_basic_selection
return self._set_basic_selection_nd(selection, value, fields=fields)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py", line 1772, in _set_basic_selection_nd
self._set_selection(indexer, value, fields=fields)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/core.py", line 1800, in _set_selection
check_array_shape('value', value, sel_shape)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/zarr/util.py", line 547, in check_array_shape
raise ValueError('parameter {!r}: expected array with shape {!r}, got {!r}'
ValueError: parameter 'value': expected array with shape (365, 231, 364), got (365, 584, 284) [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/store_chunk/Execute-ptransform-56']
Now with:
29T03:18:41.116192+00:00] [worker:daymet-daily-8efc9a7712d3-10282000-lymd-harness-h9tm] Operation ongoing for over 771.76 seconds in state process-msecs in step Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56 without returning. Current Traceback:
File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 937, in _bootstrap
self._bootstrap_inner()
File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/utils/thread_pool_executor.py", line 53, in run
self._work_item.run()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/utils/thread_pool_executor.py", line 37, in run
self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 356, in task
self._execute(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
response = task()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
return getattr(self, request_type)(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
bundle_processor.process_bundle(instruction_id))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
input_op_by_transform_id[element.transform_id].process_encoded(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
self.output(decoded_value)
File "/Users/yuvipanda/.local/share/virtualenvs/pangeo-forge-runner/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/Users/yuvipanda/code/pangeo-forge-recipes/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 84, in sync
if event.wait(1):
File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 581, in wait
signaled = self._cond.wait(timeout)
File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 316, in wait
gotit = waiter.acquire(True, timeout)
[2022-10-29T03:18:41.158921+00:00] [worker:daymet-daily-8efc9a7712d3-10282000-lymd-harness-h9tm] Operation ongoing for over 668.51 seconds in state process-msecs in step Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56 without returning. Current Traceback:
File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 937, in _bootstrap
self._bootstrap_inner()
File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/utils/thread_pool_executor.py", line 53, in run
self._work_item.run()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/utils/thread_pool_executor.py", line 37, in run
self._future.set_result(self._fn(*self._fn_args, **self._fn_kwargs))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 356, in task
self._execute(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 284, in _execute
response = task()
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 357, in <lambda>
lambda: self.create_worker().do_instruction(request), request)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 597, in do_instruction
return getattr(self, request_type)(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/sdk_worker.py", line 635, in process_bundle
bundle_processor.process_bundle(instruction_id))
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1003, in process_bundle
input_op_by_transform_id[element.transform_id].process_encoded(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/apache_beam/runners/worker/bundle_processor.py", line 227, in process_encoded
self.output(decoded_value)
File "/Users/yuvipanda/.local/share/virtualenvs/pangeo-forge-runner/lib/python3.9/site-packages/apache_beam/transforms/core.py", line 1956, in <lambda>
File "/Users/yuvipanda/code/pangeo-forge-recipes/pangeo_forge_recipes/executors/beam.py", line 40, in exec_stage
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py", line 156, in cache_input
config.storage_config.cache.cache_file(
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 173, in cache_file
_copy_btw_filesystems(input_opener, target_opener)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/storage.py", line 43, in _copy_btw_filesystems
data = source.read(BLOCK_SIZE)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 590, in read
return super().read(length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/spec.py", line 1643, in read
out = self.cache._fetch(self.loc, self.loc + length)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/caching.py", line 377, in _fetch
self.cache = self.fetcher(start, bend)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 111, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 84, in sync
if event.wait(1):
File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 581, in wait
signaled = self._cond.wait(timeout)
File "/srv/conda/envs/notebook/lib/python3.9/threading.py", line 316, in wait
gotit = waiter.acquire(True, timeout)
Hi @yuvipanda, quick question about this recipe. I saw that you're specifying aiohttp.BasicAuth
:
https://github.com/yuvipanda/staged-recipes/blob/5a50ed2ed1212e77a7615cbbbc285c7163dce465/recipes/daymet/recipe.py#L15
client_kwargs = {
'auth': aiohttp.BasicAuth(username, password),
'trust_env': True,
}
...
fsspec_open_kwargs=dict(
client_kwargs=client_kwargs
),
I'm wanted to check whether this was successfully serialized when generating the recipe hash? (https://github.com/pangeo-forge/pangeo-forge-recipes/pull/429).
@andersy005 encountered a problem where aiohttp
client_kwargs
couldn't be serialized in a feedstock recipe run, a couple of workarounds are suggested here: https://github.com/pangeo-forge/eooffshore_ics_ccmp_v02_1_nrt_wind-feedstock/pull/3#discussion_r1010420618
I'd have thought that there'd be a similar issue with BasicAuth
, I've replicated it locally (see pangeo_forge_recipes.serialization
) and it doesn't seem to be serializable:
In [4]: import inspect
...: from collections.abc import Collection
...: from dataclasses import asdict
...: from enum import Enum
...: from hashlib import sha256
...: from json import dumps
...: from typing import Any, List, Sequence
In [5]: def either_encode_or_hash(obj: Any):
...: """For objects which are not serializable with ``json.dumps``, this function defines
...: type-specific handlers which extract either a serializable value or a hash from the object.
...: :param obj: Any object which is not serializable to ``json``.
...: """
...:
...: if isinstance(obj, Enum): # custom serializer for FileType, CombineOp, etc.
...: return obj.value
...: elif hasattr(obj, "sha256"):
...: return obj.sha256.hex()
...: elif inspect.isfunction(obj):
...: return inspect.getsource(obj)
...: elif isinstance(obj, bytes):
...: return obj.hex()
...: raise TypeError(f"object of type {type(obj).__name__} not serializable")
In [11]: ba = aiohttp.BasicAuth(login="test", password="test")
In [12]: either_encode_or_hash(ba)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 either_encode_or_hash(ba)
Input In [5], in either_encode_or_hash(obj)
13 elif isinstance(obj, bytes):
14 return obj.hex()
---> 15 raise TypeError(f"object of type {type(obj).__name__} not serializable")
TypeError: object of type BasicAuth not serializable
@derekocallaghan yeah, am definitely running into that too! However, this isn't an actual serialization issue but something to do with the hashing (which IMO should probably just be removed?). To unblock myself temporarily while I ramp up on writing more recipes, I've applied the following patch to pangeo-forge-recipes :D
diff --git a/pangeo_forge_recipes/serialization.py b/pangeo_forge_recipes/serialization.py
index c7fb42a..d2c8828 100644
--- a/pangeo_forge_recipes/serialization.py
+++ b/pangeo_forge_recipes/serialization.py
@@ -22,7 +22,8 @@ def either_encode_or_hash(obj: Any):
return inspect.getsource(obj)
elif isinstance(obj, bytes):
return obj.hex()
- raise TypeError(f"object of type {type(obj).__name__} not serializable")
+ return bytes.fromhex("6ac3c336e4094835293a3fed8a4b5fedde1b5e2626d9838fed50693bba00af0e")
+ # raise TypeError(f"object of type {type(obj).__name__} not serializable")
def dict_to_sha256(dictionary: dict) -> bytes:
Hi @yuvipanda, yep, the hashing in serialization.py
has raised some issues recently. It may be the case that we can exclude client_kwargs
or even fsspec_open_kwargs
from the hashing (see below), as I guess they could contain arbitrary values/objects. E.g. this possible workaround
FilePattern
has its own sha256()
implementation, and although fsspec_open_kwargs
(and client_kwargs
) is currently included, perhaps this could be excluded (similar to the pattern format_function
and combine dims currently excluded), or particular kwargs
(e.g. any aiohttp
instances) could be excluded from fsspec_open_kwargs
prior to serialization. I guess timeouts/credentials values are something that aren't necessary for a recipe hash?
https://github.com/pangeo-forge/pangeo-forge-recipes/blob/3441d94a290d296b8f62638df15bb60993e86b1d/pangeo_forge_recipes/patterns.py#L299
# we exclude the format function and combine dims from ``root`` because they determine the
# index:filepath pairs yielded by iterating over ``.items()``. if these pairs are generated in
# a different way in the future, we ultimately don't care.
root = {
"fsspec_open_kwargs": pattern.fsspec_open_kwargs,
"query_string_secrets": pattern.query_string_secrets,
"file_type": pattern.file_type,
"nitems_per_file": {
op.name: op.nitems_per_file # type: ignore
for op in pattern.combine_dims
if op.name in pattern.concat_dims
},
}
@derekocallaghan I think the recipe hash should be an allow_list, including only specific things it wants to track, rather than exclude specific things. I am not exactly sure what this hash is actually used for right now, do you know?
There are two separate serialization issues here though - one is related to beam serialization, and one is related to hashing to get a hash id for the recipe. They probably both need different solutions as well
Yeah, previously the hash was created on demand, where BaseRecipe.sha256()
was called from BaseRecipe.get_execution_context()
, which itself was called from XarrayZarrRecipe.prepare_target()
, and generating a default job name in pangeo_forge_runner.commands.bake.Bake
. I couldn't see where it was used apart from that at the time.
Agree that an allow_list is preferable.
With your hash workaround above, does the subsequent Beam-related pickling/serialization work?
@derekocallaghan I think so. I'm running it with local direct runner and was able to generate a full series of one particular variable just for HI!

I've just pushed my latest changes. I'm trying to get a couple steps running for all of the regions and variables.
I'm producing one recipe per variable per region, partially to try see if I can get that to work before trying to merge the variables into one dimension. @jbusecke made me realize we can't actually easily combine the three regions into one!
I'm testing this locally the following way:
- Make a
local_runner_config.py
file with the following:
import pathlib
HERE = pathlib.Path(__file__).parent
c.TargetStorage.fsspec_class = "fsspec.implementations.local.LocalFileSystem"
c.TargetStorage.root_path = f"file://{HERE}/storage/output/{{job_id}}"
c.InputCacheStorage.root_path = f"file://{HERE}/storage/cache"
c.InputCacheStorage.fsspec_class = c.TargetStorage.fsspec_class
c.MetadataCacheStorage.root_path = f"file://{HERE}/storage/metadata/{{job_id}}"
c.MetadataCacheStorage.fsspec_class = c.TargetStorage.fsspec_class
c.Bake.bakery_class = "pangeo_forge_runner.bakery.local.LocalDirectBakery"
- Install
pangeo-forge-runner
from master - Install
pangeo-forge-recipes
from master, with the patch indicated. - Run it from this repo with:
pangeo-forge-runner bake --repo . --feedstock-subdir=recipes/daymet -f local_runner_config.py --prune
Currently it fails with the following when trying to run the NA files:
Worker: severity: WARN timestamp { seconds: 1667987210 nanos: 63492059 } message: "Variable tmin of 92123153000 bytes is 184.25 times larger than specified maximum variable array size of 500000000 bytes. Consider re-instantiating recipe with `subset_inputs = {\"time\": 185}`. If `len(ds[\"time\"])` < 185, substitute \"time\" for any name in ds[\"tmin\"].dims with length >= 185 or consider subsetting along multiple dimensions. Setting PANGEO_FORGE_MAX_MEMORY env variable changes the variable array size which will trigger this warning." instruction_id: "bundle_1608" transform_id: "Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/store_chunk/Execute" log_location: "/Users/yuvipanda/code/pangeo-forge-recipes/pangeo_forge_recipes/recipes/xarray_zarr.py:621" thread: "Thread-14"