Intake conversion Cross-contour_transport
Following the discussion in issue #313, we propose converting the recipes to use Intake, given that the Cookbook is no longer supported and the ACCESS-NRI Intake catalog is now available.
A few months ago, @max-anu began working on this transition. This pull request contains the changes @max-anu made to the notebook specified in the title.
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@taimoorsohail just checking: the Along_isobath_average.ipynb was pushed in this branch on purpose or was is supposed to go in #416 only?
I thought I pushed from taimoorsohail:main to Cosima-recipes:main? Either way, no, I wasn't trying to push in this branch. I was reviewing this PR request but there are significant bugs in the Cross-contour_transport code (not related to this PR - they are well documented in other issues), so I think @adele-morrison is ironing out those bugs.
Your local clone must have had some edits from previous things you'd be working on and then you did something like git add * or git commit -am"message", either of which adds all the modifications to the commit.
That's a guess. I'll sort it out -- don't worry! :)
I thought I pushed from
taimoorsohail:maintocosima-recipes:main
I don't understand exactly what do you mean. Nobody is allowed to push to cosima-recipes:main, you can only make a PR to request merging changes to cosima-recipes:main.
This PR was opened from a branch of cosima-recipes. So to add commits to that PR you need to clone the main cosima-recipes branch, do git checkout INTAKE_Cross-contour_transport and then push commits to that branch.
Anyway I fixed it... I'm just trying to make it easier for other times but I'm not sure if I'm explaining myself properly. [I do admit that it's early am, I'm jet lagged and coffee shops are closed where I currently am! :)]
I got about halfway through merging the updates from the trunk (PR #423) yesterday, but now can't continue without Gadi access. Whilst doing that, I identified a potential issue with the Intake catalogue of the dataset being used: https://github.com/ACCESS-NRI/access-nri-intake-catalog/issues/192
I've completed merging the latest main into this branch, and completed the intake conversion at the same time (I think). As part of that, I've added a small section on chunking the data; for the data volume in use, I doubt it makes a lot of difference, but I think it's a useful example. @navidcy I'll add you as a reviewer; please let me know if there's anyone else I should add.
Actually could you request review from somebody else? There are many COSIMAers who would be willing to contribute.
@adele-morrison are you in a position to be able to review this, and if not, could you suggest someone who is?
I just had a look at this one and it seems like the conversion to Intake is all done, nice work on getting that across the line, @marc-white. From what I can tell, the remaining issues are more general bugs that are being addressed in other PRs and not directly related to the Intake side of things.
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:23Z ----------------------------------------------------------------
I'm wondering if we want to re-run this in a more recent environment (xarray deprecation warning regarding multi-indexes below)
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:24Z ----------------------------------------------------------------
Line #2. sorted(set().union(*cat_subset.df['variable']))
Assuming I've understood the intent, I think this would be better expressed as
cat_subset.unique().variable
charles-turner-1 commented on 2025-05-23T01:05:58Z ----------------------------------------------------------------
probably don't need this cell at all.
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:25Z ----------------------------------------------------------------
Line #3. 'hu', # Load this to access the xu_ocean, yu_ocean coordinates
We can probably search for xu_ocean,yu_ocean directly now
charles-turner-1 commented on 2025-05-23T01:05:46Z ----------------------------------------------------------------
unnecessary, resolving
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:25Z ----------------------------------------------------------------
Line #1. # For some reason we need a threshold of 1.01 here not just 1. Not sure why...
Could this be as simple as the fact it uses > and not >= ? Either that or I guess a floating point precision?
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:26Z ----------------------------------------------------------------
Not integers here, so presumably not above. Maybe we add a small caveat about how floats work?
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:27Z ----------------------------------------------------------------
I think we can probably do this (and the next couple cells) more cleanly - excuse the very sloppy psuedocode, which would need numpy-ification as well:
contours = [(xc, yc) for xc, yc in zip(x_contour, y_contour)] contours = unique(contours) x_contour = [c[0] for c in contours] y_contour = [c[1] for c in contours]
charles-turner-1 commented on 2025-05-23T05:34:29Z ----------------------------------------------------------------
Nope.
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:28Z ----------------------------------------------------------------
It looks like this label is wrong to me, but TBH it was wrong before the update anyway. I think it should be point number on the colour axis.
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:28Z ----------------------------------------------------------------
Line #3. darray = var_search.to_dask()
We should call this dset really, since it's a Dataset not DataArray.
Then darray = dset['ty_trans'] is correctly named
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:29Z ----------------------------------------------------------------
Line #1. ty_trans = ty_trans.chunk(chunks={"time": -1, "st_ocean": -1, "yu_ocean": 102, "xt_ocean": 360})
Bit of an aside but I think we want xt_ocean chunks of 400 for max performance here.
Maybe we could add validate_chunkspec in here?
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:30Z ----------------------------------------------------------------
Line #8. tx_trans = tx_trans.chunk(chunks={"time": -1, "st_ocean": -1, "yt_ocean": 102, "xu_ocean": 360})
As above about 360 => 400 (I think)
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:30Z ----------------------------------------------------------------
Line #4. plt.ylabel('Cumulative transport across contour (Sv)'); # Is Sv the correct units? See comment under "Convert from mass transport to volume transport"
Sv = 10^6 m^3/s. Since we have a volume transport from dividing by rho above we should be all good & can remove this comment
View / edit / reply to this conversation on ReviewNB
charles-turner-1 commented on 2025-05-22T03:22:31Z ----------------------------------------------------------------
Initial comment at top about using a more recent environment. This might raise by now?
Sorry, turned into quite a thorough review. I'm happy to address my points myself if you've got heaps on your plate @marc-white.
Something has gone badly & confusingly wrong with the coordinate change. I can't see anything that's obviously caused it, so this might take a little while to get fixed up. Potentially due to an updated dependency