icepyx icon indicating copy to clipboard operation
icepyx copied to clipboard

H5Coro testing and integration

Open zachghiaccio opened this issue 7 months ago • 5 comments

Opening this to start discussion and development on the H5coro integration piece. Based on discussions with Rachel and Eric, we have several options for implementing h5coro into icepyx. However, it might require significant restructuring of the Read module in order to support the h5coro backend and (potentially) Xarray DataTrees.

zachghiaccio avatar Apr 30 '25 18:04 zachghiaccio

Binder :point_left: Launch a binder notebook on this branch for commit 245fcace5ece06332cd8b88e8db8626e6b612c0a

I will automatically update this comment whenever this PR is modified

Binder :point_left: Launch a binder notebook on this branch for commit 83ae9a43953b3904f83165d32802a5419559bc66

Binder :point_left: Launch a binder notebook on this branch for commit 237871a72f3d4a38acd4c595b047757e4fbc0533

Binder :point_left: Launch a binder notebook on this branch for commit 49ab30cce40011fd3b4a6e7ca7f3b91d7010a966

Binder :point_left: Launch a binder notebook on this branch for commit 664d5173fe6923c0522a20e6dbbc41b71ca01dd7

github-actions[bot] avatar Apr 30 '25 18:04 github-actions[bot]

Exciting to see the progress, thanks for posting @zachghiaccio!

One question - does the Quest module have to change in order to use h5coro? I see several modified Quest files, which surprised me because I wouldn't have thought they'd need to change.

I am noticing that there is one commit on this branch from September 2023. Maybe that is the source of the Quest file modifications? (and perhaps also the merge conflicts?)

rwegener2 avatar Apr 30 '25 19:04 rwegener2

Yeah I'm a bit confused on those Quest changes as well, because I don't think there's anything in Quest right now that requires h5coro. The changes in said commit are already implemented into icepyx, so I must have forgotten to pull those changes.

zachghiaccio avatar May 01 '25 18:05 zachghiaccio

Hey everyone! I played around with this a bit this afternoon. @zachghiaccio I removed everything from this commit that was related to quest, since it sounds like the h5coro work isn't related to the quest activitiy.

In terms of h5coro I'm starting to develop a short list of things that need to be done to allow for h5coro to be added to icepyx:

  1. specify engine=h5coro in the xr.open_dataset() calls (@zachghiaccio did this one already)
  2. h5coro doesn't want to read an fsspec object, it wants a string to an s3 filepath (without the s3:// at the beginning)
  3. because of the previous point h5coro also needs the s3 access token to be specified using the credentials argument
  4. implement phony_dims kwarg in h5coro

The first 3 pieces above can be completed for the moment pretty easily (I made these changes in a fast way in the most recent commit). The bigger bump is point no. 4. Icepyx relies on the phony_dims: access kwarg on the xr.open_dataset() function, which isn't implemented in h5coro. I don't understand well what this kwarg does, but it is appears to be related to smushing together datasets of differing dimensions. If you try to read data in icepyx without that kwarg an error is returned:

ValueError: conflicting sizes for dimension 'crossing_time': length 1 on 'cycle_number' and length 15 on {'crossing_time': 'bounding_polygon_lat1'}

While I didn't get all the way to the end of implementing h5coro in icepyx, I did notice the reoccuring theme:

the amount that h5coro can speed up data reading in icepyx is limited by the number of times that icepyx splits up the reads

https://github.com/icesat2py/icepyx/discussions/676 goes into more depth on some profiling to find that reading a single beam of data creates three separate read requests (three groups). As a parallelization library h5coro would provide the most benefit if we could consolidate our reads into one and let h5coro do the work of splitting that up. All interesting points to consider in the broader conversation about what the Read module in icepyx should look like.

rwegener2 avatar May 07 '25 19:05 rwegener2

Sorry for the delayed response, but thanks for looking into all of this @rwegener2 ! I'm juggling two science team meetings today, but I'll finally have a bit of time this afternoon to check out your additions, and maybe look into the phony_dims kwarg, since I want to understand that better as well.

zachghiaccio avatar May 21 '25 14:05 zachghiaccio