cosima-recipes icon indicating copy to clipboard operation
cosima-recipes copied to clipboard

Intake conversion Cross-contour_transport

Open rbeucher opened this issue 1 year ago • 9 comments

Following the discussion in issue #313, we propose converting the recipes to use Intake, given that the Cookbook is no longer supported and the ACCESS-NRI Intake catalog is now available.

A few months ago, @max-anu began working on this transition. This pull request contains the changes @max-anu made to the notebook specified in the title.

rbeucher avatar Jun 06 '24 05:06 rbeucher

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@taimoorsohail just checking: the Along_isobath_average.ipynb was pushed in this branch on purpose or was is supposed to go in #416 only?

navidcy avatar Jul 09 '24 04:07 navidcy

I thought I pushed from taimoorsohail:main to Cosima-recipes:main? Either way, no, I wasn't trying to push in this branch. I was reviewing this PR request but there are significant bugs in the Cross-contour_transport code (not related to this PR - they are well documented in other issues), so I think @adele-morrison is ironing out those bugs.

taimoorsohail avatar Jul 09 '24 04:07 taimoorsohail

Your local clone must have had some edits from previous things you'd be working on and then you did something like git add * or git commit -am"message", either of which adds all the modifications to the commit.

That's a guess. I'll sort it out -- don't worry! :)

navidcy avatar Jul 09 '24 04:07 navidcy

I thought I pushed from taimoorsohail:main to cosima-recipes:main

I don't understand exactly what do you mean. Nobody is allowed to push to cosima-recipes:main, you can only make a PR to request merging changes to cosima-recipes:main.

This PR was opened from a branch of cosima-recipes. So to add commits to that PR you need to clone the main cosima-recipes branch, do git checkout INTAKE_Cross-contour_transport and then push commits to that branch.

Anyway I fixed it... I'm just trying to make it easier for other times but I'm not sure if I'm explaining myself properly. [I do admit that it's early am, I'm jet lagged and coffee shops are closed where I currently am! :)]

navidcy avatar Jul 09 '24 04:07 navidcy

I got about halfway through merging the updates from the trunk (PR #423) yesterday, but now can't continue without Gadi access. Whilst doing that, I identified a potential issue with the Intake catalogue of the dataset being used: https://github.com/ACCESS-NRI/access-nri-intake-catalog/issues/192

marc-white avatar Sep 11 '24 23:09 marc-white

I've completed merging the latest main into this branch, and completed the intake conversion at the same time (I think). As part of that, I've added a small section on chunking the data; for the data volume in use, I doubt it makes a lot of difference, but I think it's a useful example. @navidcy I'll add you as a reviewer; please let me know if there's anyone else I should add.

marc-white avatar Sep 16 '24 06:09 marc-white

Actually could you request review from somebody else? There are many COSIMAers who would be willing to contribute.

navidcy avatar Sep 16 '24 06:09 navidcy

@adele-morrison are you in a position to be able to review this, and if not, could you suggest someone who is?

marc-white avatar Nov 07 '24 04:11 marc-white

I just had a look at this one and it seems like the conversion to Intake is all done, nice work on getting that across the line, @marc-white. From what I can tell, the remaining issues are more general bugs that are being addressed in other PRs and not directly related to the Intake side of things.

rbeucher avatar May 22 '25 02:05 rbeucher

View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:23Z ----------------------------------------------------------------

I'm wondering if we want to re-run this in a more recent environment (xarray deprecation warning regarding multi-indexes below)


View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:24Z ----------------------------------------------------------------

Line #2.    sorted(set().union(*cat_subset.df['variable']))

Assuming I've understood the intent, I think this would be better expressed as

cat_subset.unique().variable

charles-turner-1 commented on 2025-05-23T01:05:58Z ----------------------------------------------------------------

probably don't need this cell at all.

View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:25Z ----------------------------------------------------------------

Line #3.        'hu',  # Load this to access the xu_ocean, yu_ocean coordinates

We can probably search for xu_ocean,yu_ocean directly now


charles-turner-1 commented on 2025-05-23T01:05:46Z ----------------------------------------------------------------

unnecessary, resolving

View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:25Z ----------------------------------------------------------------

Line #1.    # For some reason we need a threshold of 1.01 here not just 1. Not sure why...

Could this be as simple as the fact it uses > and not >= ? Either that or I guess a floating point precision?


View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:26Z ----------------------------------------------------------------

Not integers here, so presumably not above. Maybe we add a small caveat about how floats work?


View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:27Z ----------------------------------------------------------------

I think we can probably do this (and the next couple cells) more cleanly - excuse the very sloppy psuedocode, which would need numpy-ification as well:

contours = [(xc, yc) for xc, yc in zip(x_contour, y_contour)]
contours = unique(contours)
x_contour = [c[0] for c in contours]
y_contour = [c[1] for c in contours]
  

charles-turner-1 commented on 2025-05-23T05:34:29Z ----------------------------------------------------------------

Nope.

View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:28Z ----------------------------------------------------------------

It looks like this label is wrong to me, but TBH it was wrong before the update anyway. I think it should be point number on the colour axis.


View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:28Z ----------------------------------------------------------------

Line #3.    darray = var_search.to_dask()

We should call this dset really, since it's a Dataset not DataArray.

Then darray = dset['ty_trans'] is correctly named


View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:29Z ----------------------------------------------------------------

Line #1.    ty_trans = ty_trans.chunk(chunks={"time": -1, "st_ocean": -1, "yu_ocean": 102, "xt_ocean": 360})

Bit of an aside but I think we want xt_ocean chunks of 400 for max performance here.

Maybe we could add validate_chunkspec in here?


View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:30Z ----------------------------------------------------------------

Line #8.    tx_trans = tx_trans.chunk(chunks={"time": -1, "st_ocean": -1, "yt_ocean": 102, "xu_ocean": 360})

As above about 360 => 400 (I think)


View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:30Z ----------------------------------------------------------------

Line #4.    plt.ylabel('Cumulative transport across contour (Sv)');  # Is Sv the correct units? See comment under "Convert from mass transport to volume transport"

Sv = 10^6 m^3/s. Since we have a volume transport from dividing by rho above we should be all good & can remove this comment


View / edit / reply to this conversation on ReviewNB

charles-turner-1 commented on 2025-05-22T03:22:31Z ----------------------------------------------------------------

Initial comment at top about using a more recent environment. This might raise by now?


Sorry, turned into quite a thorough review. I'm happy to address my points myself if you've got heaps on your plate @marc-white.

charles-turner-1 avatar May 22 '25 03:05 charles-turner-1

unnecessary, resolving


View entire conversation on ReviewNB

charles-turner-1 avatar May 23 '25 01:05 charles-turner-1

probably don't need this cell at all.


View entire conversation on ReviewNB

charles-turner-1 avatar May 23 '25 01:05 charles-turner-1

Something has gone badly & confusingly wrong with the coordinate change. I can't see anything that's obviously caused it, so this might take a little while to get fixed up. Potentially due to an updated dependency

charles-turner-1 avatar May 23 '25 03:05 charles-turner-1

Nope.


View entire conversation on ReviewNB

charles-turner-1 avatar May 23 '25 05:05 charles-turner-1