xbatcher icon indicating copy to clipboard operation
xbatcher copied to clipboard

Refactor Pytorch dataset

Open jhamman opened this issue 1 year ago • 3 comments

Description of proposed changes

This PR introduces a number of changes to xbatcher's pytorch MapDataset class. These changes came out of the learnings while preparing my AMS talk (slides and blog post coming soon). The changes here allow for more control over how w

  • includes concurrent loading of x/y batches with dask.compute()
  • transforms now are expected to take an xarray object and return a torch.Tensor

jhamman avatar Feb 01 '24 19:02 jhamman

Codecov Report

Attention: Patch coverage is 90.62500% with 3 lines in your changes missing coverage. Please review.

Project coverage is 97.46%. Comparing base (43c9135) to head (e74de9e). Report is 17 commits behind head on main.

Files Patch % Lines
xbatcher/loaders/torch.py 90.62% 1 Missing and 2 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #202      +/-   ##
==========================================
+ Coverage   96.25%   97.46%   +1.21%     
==========================================
  Files           6        6              
  Lines         347      395      +48     
  Branches       82       92      +10     
==========================================
+ Hits          334      385      +51     
+ Misses          8        5       -3     
  Partials        5        5              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 01 '24 19:02 codecov[bot]

Thanks for contributing these improvements @jhamman! It would be good to hit the test coverage target, so do you mind if I push additional tests as part of the review?

maxrjones avatar Feb 05 '24 17:02 maxrjones

Sorry that I did not have time to review this yet! I am about to head out of office but would be glad to work on this with you after returning on March 5, or would welcome other reviewers in the meantime.

maxrjones avatar Feb 13 '24 21:02 maxrjones

pre-commit.ci autofix

andersy005 avatar Aug 15 '24 17:08 andersy005

@jhamman, i pushed additional tests/fixes to your branch.. i hope you don';t mind :)

@maxrjones, curious to hear your feedback on this before I merge it into main when you get a chance

andersy005 avatar Aug 15 '24 23:08 andersy005

@weiji14 would you have any interest in reviewing this PR? I can if you don't have time or interest, I just expect that your expertise would be particularly beneficial on the PyTorch dataset revisions. It'd also be helpful to know if we need to expect downstream implications for zen3geo.

maxrjones avatar Aug 16 '24 13:08 maxrjones

I'll be travelling most of the next two weeks (about to head to the airport actually), so probably won't get to this until September. If you can do the review @maxrjones, that would be great!

weiji14 avatar Aug 16 '24 20:08 weiji14

I'll be travelling most of the next two weeks (about to head to the airport actually), so probably won't get to this until September. If you can do the review @maxrjones, that would be great!

Sounds good, safe travels! @andersy005 I'll block some time to look at this on Friday.

maxrjones avatar Aug 20 '24 15:08 maxrjones

@andersy005 I'll block some time to look at this on Friday.

I've had some other work come up today so unfortunately won't get a chance to review. I'll send an updated timeline next week, but feel welcome to proceed without me if you prefer.

maxrjones avatar Aug 23 '24 18:08 maxrjones

this can definitely wait. so, no rush/worries @maxrjones

andersy005 avatar Aug 23 '24 19:08 andersy005