jwst icon indicating copy to clipboard operation
jwst copied to clipboard

JP-3690: Switch from ModelContainer to ModelLibrary for image3 pipeline

Open emolter opened this issue 1 year ago • 14 comments

Resolves JP-3690 Resolves JP-3619 Resolves JP-3620 Resolves JP-3621, see JP-3707 for additional work on resample memory usage beyond the scope of this ticket. Resolves JP-3498 Work is under the epic JP-3602.

Fixes #8649 Fixes #8478 Fixes #8479 Fixes #8480 Fixes #8164

This PR is part of a larger effort to improve memory usage throughout the pipeline. Here, the AbstractModelLibrary class in stpipe is subclassed for JWST, and the pipeline steps that form the calwebb_image3 pipeline are updated to use ModelLibrary instead of ModelContainer. This facilitates ensuring that the pipeline step runs the same way whether models are loaded into memory or saved to disk.

When ModelLibrary is used with on_disk=True, memory usage is lower, both for individual pipeline steps and for calwebb_spec3 as a whole. An analysis of memory usage for each step can be found in the linked tickets JP-3619, JP-3620, and JP-3621. An analysis of the memory usage as a whole, when run on a large dataset, can be found in JP-3690.

Checklist for PR authors (skip items if you don't have permissions or they are not applicable)

  • [x] added entry in CHANGES.rst within the relevant release section
  • [x] updated or added relevant tests
  • [x] updated relevant documentation
  • [x] added relevant milestone
  • [x] added relevant label(s)
  • [x] ran regression tests, post a link to the Jenkins job below. How to run regression tests on a PR
  • [ ] All comments are resolved
  • [ ] Make sure the JIRA ticket is resolved properly

emolter avatar Jul 30 '24 19:07 emolter

Codecov Report

Attention: Patch coverage is 89.09396% with 65 lines in your changes missing coverage. Please review.

Project coverage is 60.77%. Comparing base (04217a2) to head (364cd60).

Files with missing lines Patch % Lines
jwst/tweakreg/tweakreg_step.py 69.23% 20 Missing :warning:
jwst/pipeline/calwebb_image3.py 71.05% 11 Missing :warning:
jwst/assign_mtwcs/moving_target_wcs.py 82.35% 6 Missing :warning:
jwst/outlier_detection/spec.py 64.28% 5 Missing :warning:
jwst/resample/resample.py 96.40% 5 Missing :warning:
jwst/assign_mtwcs/assign_mtwcs_step.py 55.55% 4 Missing :warning:
jwst/datamodels/library.py 94.36% 4 Missing :warning:
jwst/outlier_detection/utils.py 96.10% 3 Missing :warning:
jwst/pipeline/calwebb_coron3.py 33.33% 2 Missing :warning:
jwst/pipeline/calwebb_spec3.py 0.00% 2 Missing :warning:
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8683      +/-   ##
==========================================
+ Coverage   60.63%   60.77%   +0.13%     
==========================================
  Files         372      373       +1     
  Lines       38375    38596     +221     
==========================================
+ Hits        23270    23458     +188     
- Misses      15105    15138      +33     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 30 '24 19:07 codecov[bot]

Latest round of regression tests here. These unfortunately include 50 failures that are all related to the delivery of new MIRI reference files and other MIRI changes on the INS side. None of the failures from the previous regtest run are still present, so I'm pretty confident this PR is passing all regtests now and is ready for review

emolter avatar Aug 15 '24 15:08 emolter

I've finished viewing and commenting on all the files in this PR. Overall I think the changes look great and the comments are mostly minor. Thanks for taking on this work! I'll follow up with comments but please let me know if there's more I can do to help to move this PR along.

braingram avatar Aug 16 '24 18:08 braingram

I ran a 972 member association (~100GB input data) through calwebb_image3 using this PR (an "on disk" library was used as input and a slightly older commit 26e543627d9ed8e28af6475f0f3fe7f8ab7862cf). The pipeline succeeded and the recorded memory usage (using memray) was as follows: Screenshot 2024-08-20 at 9 38 32 AM I will add this information and a link to the memory profile to the jira ticket. The peak memory usage was 50GB and this is largely due to the context array generated during resample. Importantly for this PR, at no point does the pipeline load all input data into memory.

braingram avatar Aug 20 '24 13:08 braingram

testing just test_nircam_image.py here, because it takes forever on my local machine

emolter avatar Aug 20 '24 17:08 emolter

Most recent regtests for test_nircam_image only are finally passing after fixing file naming issues. The same fix should apply for the three MIRI image3 failures here. I'm going to wait on another full regtest run until after the next round of comments.

emolter avatar Aug 20 '24 22:08 emolter

Thanks again for all your work on this. I went through the files and left a new set of comments. It looks very close to done.

braingram avatar Aug 21 '24 14:08 braingram

New round of regression tests started here after responding to comments from Brett's second review.

edit: file name issues recurred, apparently the small bug setting pool and table names in ModelLibrary was not the cause. Need to put back in the manual setting of model.meta.filename. Starting another round here

emolter avatar Aug 21 '24 20:08 emolter

Another round of regtests after all the nircam image and miri image failures are fixed is here. Fingers crossed that the only failures are the same ones that show up on the nightly runs

I believe that all the regression tests are unrelated

emolter avatar Aug 23 '24 20:08 emolter

Why there is still usage for the ModelContainer? Is ModelLibrary not completely replacing ModelContainer? Is it only for backward compatibility to support ModelContainer inputs?

mcara avatar Aug 25 '24 15:08 mcara

Why there is still usage for the ModelContainer? Is ModelLibrary not completely replacing ModelContainer? Is it only for backward compatibility to support ModelContainer inputs?

We decided to only replace container with library in the calwebb_image3 pipeline for now, where the memory performance gains are most important. There are other complications with replacing this for the spectroscopic modes too, e.g., the SourceModelContainer class that uses ModelContainer as a parent class needs to be replaced also

emolter avatar Aug 26 '24 13:08 emolter

Another round of regression tests have been started here after fixes per Melanie's review.

Edit: lots of failing tests, but they match last night's nightly run

emolter avatar Aug 27 '24 20:08 emolter