jwst icon indicating copy to clipboard operation
jwst copied to clipboard

ability to pick which source is extracted in calwebb_spec3 step

Open eclake opened this issue 3 years ago • 7 comments

Hi,

I've been running the pipeline on simulated data, and found that moving the calwebb_spec2 onto a cluster was beneficial in the case of multiple exposures/dithers/nods. It would be amazing to be able to do the same for the calwebb_spec3 step - maybe by passing the ID to the pipeline step when called, meaning that instead of processing all sources here maybe it could just process one of the sources.

Just a suggestion, I may edit my local version to try it out.

Cheers, Emma

eclake avatar Jun 14 '22 14:06 eclake

Hi Emma,

I can try to put something together quickly and see if it's any faster. What kind of data are you simulating, and roughly how long is it taking?

tapastro avatar Jun 14 '22 19:06 tapastro

Here's a quick first attempt, just to test viability: https://github.com/tapastro/jwst/tree/spec3_multiprocess

In terms of usage, I've added an alternate pipeline with the name Spec3MultiPipeline (or command line alias calwebb_spec3_multi) - you can run that on an association file with many exposures, each with many sources, and it should create instances of Spec3SinglePipeline, one for each source. If your intended use case was not to have a single python process spawning subprocesses, you could comment out the end of the Spec3MultiPipeline just to have it create the source-specific associations, then make individual runs of, say, strun calwebb_spec3_singlesource {source association filename}; those would be single thread and you could spawn one per source.

Please let me know if this is helpful, or if any clarification is needed! The logging is a complete mess and could be improved with a bit of work.

tapastro avatar Jun 14 '22 22:06 tapastro

It occurs to me that the calwebb_spec2 pipeline step "extract_2d" already allows for the user to designate which slit/source to extract (rather than extracting all slits/sources) via the "slit_name" parameter. So an alternative approach could be to rerun calwebb_spec2 on the individual exposures that make up the observation and only extract that one source. This would result in "cal" files that contain data for only that source. Then feed those into the normal calwebb_spec3 pipeline for final Stage 3 processing.

hbushouse avatar Jun 15 '22 13:06 hbushouse

@tapastro - thanks so much, I'll give it a try. I will probably try it the second way you suggested.

@hbushouse - great to know! I like the idea of keeping the spec2 output as they are, as multiple dithers and exposures already give me a lot of files to handle which themselves needed a cluster to run on in a decent amount of time, but what you suggest is certainly doable with careful file management.

I've been working on MOS observations produced by the IPS simulator, a single pointing, 3 nods, 3 dithers, 3 different sets of objects, one per dither, with lots of overlap between the object lists, 317 objects total.

  • PRISM/CLEAR - 19 groups, 2 integrations, 12 exposures
  • each of the gratings G140M/F070LP, G235M/F170LP, G395/F290LP - 19 groups, 2 integrations, 3 exposures

I'm using an old version of the pipeline because some changes at 1.5.0 cause problems with my non-standard flat fields (I think) - and with 1.4.3 (which I tested on previously, but I know doesn't have the improvements on resampling that are apparently slower) it was calwebb_spec3 took ~3 days to run sequentially on a mac m1 pro just for the GRATING (PRISM was quicker).

eclake avatar Jun 15 '22 15:06 eclake

Another suggestion may be to try to iron out the bugs in using the most up-to-date pipeline - this merged PR dramatically improved spectral resampling speed. It's currently only available via the current master branch, but it might be worth pursuing if your flat field issue is tractable.

tapastro avatar Jun 15 '22 15:06 tapastro

@tapastro - nice! I'll take a look at that too. So far, haven't figured out the flat issue, so might not be tractable for these simulations, but good to know.

eclake avatar Jun 15 '22 15:06 eclake

Linked to Jira at https://jira.stsci.edu/browse/JP-2915

hbushouse avatar Sep 14 '22 15:09 hbushouse