staged-recipes
staged-recipes copied to clipboard
C-iTRACE tracer pointing to NetCDF4 files
Now points to netcdf4 files hosted on figshare, hopefully sidestepping the netcdf3 issue
@cisaacstern I seem to be incapable of producing an immaculate pair of files. Is this something I should try to fix or is this a case for your magic wand?
[I rewrote these files as netcdf4 files which should fix our netcdf3 block.]
@jordanplanders in your local development environment, you should be able to:
pip install pre-commit
and then, from the root of your clone of the staged-recipes
repo, run
pre-commit install
then, each time you make a git commit
, pre-commit
should auto-format your files for you. These changes will then need to be added to the commit and the git commit
call re-run.
For more on this: https://pre-commit.com/
For work that has not been running pre-commit
all along, sometimes it's useful to establish a clean baseline with
pre-commit run --all-files
which will bring all files up to spec with formatting.
@cisaacstern Done! Now with a clean bill of health! Fantastic!
/run citrace_tracer
The test failed, but I'm sure we can find out why!
Pangeo Forge maintainers are working diligently to provide public logs for contributors. That feature is not quite ready yet, however, so please reach out on this thread to a maintainer, and they'll help you diagnose the problem.
@cisaacstern Oy. Failed. Ok, I'm not sure what went wrong, but I'll get out my fine (and coarse) toothed comb and go on a hunt.
@jordanplanders... I'll pull the logs now... (Getting these public by default is one of my top priorities...)
fname = config.file_pattern[input_key]
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/patterns.py", line 219, in __getitem__
fname = self.format_function(**format_function_kwargs)
File "/tmp/tmpzno97m9d/recipes/citrace_tracer/recipe.py", line 33, in make_url
NameError: name 'url_d' is not defined [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
Looks like the issue is the undefined variable url_d
here. This should show up when running the recipe locally, so maybe worth doing that before we test again on the cloud, in case other such nits are lurking?
@cisaacstern I confess that I'm a bit puzzled. I just ran it locally with the url_d
inside and outside the scope of the function make_url
for the smallest of the files without trouble. I can update the recipe so that url_d
is in the make_url
function scope though and see if that helps.
@cisaacstern I confess that I'm a bit puzzled. I just ran it locally with the
url_d
inside and outside the scope of the functionmake_url
for the smallest of the files without trouble. I can update the recipe so thaturl_d
is in themake_url
function scope though and see if that helps.
Also, we could test the recipe with just the small file before setting it loose on the full canon.
@jordanplanders, oh sorry didn't realize that url_d
was defined outside the function scope.
This is a known issue with deploying these recipes to Google Cloud Dataflow (where we run them in the cloud): variables defined outside function scopes are often (always?) not available within the function. This has to do with how Dataflow serializes the components of the recipe, but the specific mechanisms are a bit beyond my current level of understanding.
Anyway, for a case like this where the variable url_d
is only used inside a single function, the simplest solution is just to move the assignment of that variable within the function scope. Then this should just work.
(It gets a little stylistically questionable when the same variable is reused in multiple functional scopes, but luckily we don't appear to have that issue here.)
@cisaacstern Makes sense. Like putting import statements in the preprocessing function. Great do have clarification. Would it be better to give it a test run with only two of the files? Maybe the little one and one of the others? Or just go for broke?
No worries about the full variable scope. The cache from this test will be reused for the production run anyway, so we're just getting ahead on data transfer. I'll re-trigger the test now.
/run citrace_tracer
The test failed, but I'm sure we can find out why!
Pangeo Forge maintainers are working diligently to provide public logs for contributors. That feature is not quite ready yet, however, so please reach out on this thread to a maintainer, and they'll help you diagnose the problem.
Seeing two different errors on backend now:
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/asyn.py", line 53, in _runner
result[0] = await coro
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/fsspec/implementations/http.py", line 420, in _info
raise FileNotFoundError(url) from exc
FileNotFoundError: https://figshare.com/ndownloader/files/38543969 [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
"
and
File "/srv/conda/envs/notebook/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
raise ClientResponseError(
RuntimeError: aiohttp.client_exceptions.ClientResponseError: 403, message='Forbidden', url=URL('https://figshare.com/ndownloader/files/38530430') [while running 'Start|cache_input|Reshuffle_000|prepare_target|Reshuffle_001|store_chunk|Reshuffle_002|finalize_target|Reshuffle_003/cache_input/Execute-ptransform-56']
"
Do both of the URLs which appear in these two errors definitely exist? And if so, do they permit unauthenticated download?
@cisaacstern I find this mysterious... I ran up against this issue when I ran this snippet
try:
with open_chunk(all_chunks[0], config=recipe) as ds:
display(ds)
except FileNotFoundError as e:
print(str(e))
which triggered an error like the one you see, but when I run it this way, no problem:
recipe_pruned = recipe.copy_pruned()
run_function = recipe_pruned.to_function()
@cisaacstern I didn't properly answer your question earlier. If I navigate to each of those links in a browser, a download begins. Any leads on what might be afoot here?
@jordanplanders thanks for the follow up. The PR which had been blocking your original feedstock was merged this week. So it's possible we won't need this workaround after all. I've opened https://github.com/pangeo-forge/C-iTRACE-feedstock/pull/5 to see if the original feedstock is indeed unblocked.
@cisaacstern I came across a version of this issue (I think), which was improved by appending '#mode=bytes' to the url.
We're really hoping to be able to make either the C-iTRACE output or possibly the full res iTRACE output available for a few events this spring. If the dev cycle isn't in our favor, I'll get creative about temporary workarounds though!