joss-reviews icon indicating copy to clipboard operation
joss-reviews copied to clipboard

[REVIEW]: Eureka!: An End-to-End Pipeline for JWST Time-Series Observations

Open editorialbot opened this issue 3 years ago • 45 comments

Submitting author: @taylorbell57 (Taylor Bell) Repository: https://github.com/kevin218/Eureka Branch with paper.md (empty if default branch): joss Version: v0.1 Editor: @dfm Reviewers: @catrionamurray, @christinahedges, @dfm Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/a1cb3f6e4e58c00bc0ab1b471c04812b"><img src="https://joss.theoj.org/papers/a1cb3f6e4e58c00bc0ab1b471c04812b/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/a1cb3f6e4e58c00bc0ab1b471c04812b/status.svg)](https://joss.theoj.org/papers/a1cb3f6e4e58c00bc0ab1b471c04812b)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@catrionamurray & @christinahedges, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @dfm know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @catrionamurray

📝 Checklist for @christinahedges

📝 Checklist for @dfm

editorialbot avatar Jun 25 '22 01:06 editorialbot

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot avatar Jun 25 '22 01:06 editorialbot

Software report:

github.com/AlDanial/cloc v 1.88  T=0.26 s (399.5 files/s, 88641.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          78           2975           8126           8828
reStructuredText                 8            542            364            813
Markdown                         4            170              0            481
YAML                             8              6             11            274
TeX                              1             11              0            168
DOS Batch                        1              8              1             26
Cython                           1              7              0             19
make                             1              4              7              9
TOML                             1              0              0              3
-------------------------------------------------------------------------------
SUM:                           103           3723           8509          10621
-------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

editorialbot avatar Jun 25 '22 01:06 editorialbot

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1088/0004-637X/754/2/136 is OK
- 10.1088/0004-637X/768/1/42 is OK
- 10.1088/0004-6256/147/6/161 is OK
- 10.1038/s41550-017-0351-6 is OK
- 10.3847/2041-8213/aabcc8 is OK
- 10.1093/mnras/stab1027 is OK
- 10.1038/nature12888 is OK
- 10.5281/zenodo.1998447 is OK
- 10.1093/mnras/stz2688 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot avatar Jun 25 '22 01:06 editorialbot

Wordcount for paper.md is 1394

editorialbot avatar Jun 25 '22 01:06 editorialbot

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot avatar Jun 25 '22 01:06 editorialbot

@catrionamurray, @christinahedges — This is the review thread for the paper. All of our communications will happen here from now on. Thanks again for agreeing to participate!

Please read the "Reviewer instructions & questions" in the first comment above, and generate your checklists by commenting @editorialbot generate my checklist on this issue ASAP. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4503 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please try to make a start ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

dfm avatar Jun 25 '22 01:06 dfm

Review checklist for @christinahedges

Conflict of interest

  • [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • [x] Repository: Is the source code for this software available at the https://github.com/kevin218/Eureka?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • [x] Contribution and authorship: Has the submitting author (@taylorbell57) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

  • [ ] Installation: Does installation proceed as outlined in the documentation?
  • [ ] Functionality: Have the functional claims of the software been confirmed?
  • [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [ ] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • [ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • [ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • [ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • [ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • [x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • [ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • [ ] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

christinahedges avatar Jun 27 '22 13:06 christinahedges

Review checklist for @catrionamurray

Conflict of interest

  • [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • [x] Repository: Is the source code for this software available at the https://github.com/kevin218/Eureka?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • [x] Contribution and authorship: Has the submitting author (@taylorbell57) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?
  • [x] Functionality: Have the functional claims of the software been confirmed?
  • [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • [x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

catrionamurray avatar Jun 29 '22 20:06 catrionamurray

@catrionamurray, @christinahedges — A quick reminder to keep this on your radar! Let me know if you have any questions or issues as you go through your review. Thanks!!

dfm avatar Jul 15 '22 08:07 dfm

@editorialbot generate pdf

christinahedges avatar Jul 19 '22 15:07 christinahedges

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot avatar Jul 19 '22 15:07 editorialbot

A question about the conflict of interest policy: I haven't directly worked with any of the authors on the Eureka! paper but I am a coauthor on two papers with Adina Feinstein (and potentially a few more people in the author list). I wasn't sure if this is an issue?

catrionamurray avatar Jul 23 '22 20:07 catrionamurray

It's not a problem on our end. I think most potential reviewers will have worked with at least one person on the Eureka! team.

kevin218 avatar Jul 23 '22 21:07 kevin218

@catrionamurray, @christinahedges — What's your timeline for providing feedback? The first ERS paper will likely be ready within a week and it would be great to use the JOSS reference.

kevin218 avatar Jul 23 '22 21:07 kevin218

A question about the conflict of interest policy: I haven't directly worked with any of the authors on the Eureka! paper but I am a coauthor on two papers with Adina Feinstein (and potentially a few more people in the author list). I wasn't sure if this is an issue?

Thanks for bringing this up, @catrionamurray! I believe that this potential conflict is sufficiently minor that, as long as you're confident that you can provide a fair review, I'm happy to waive this apparent conflict of interest. If anyone involved feels that this in inappropriate, please feel free to comment here, or contact me directly via email (thanks @kevin218 for your confirmation already!). Otherwise, I think we should proceed with the review as planned. Thanks again!!

dfm avatar Jul 25 '22 11:07 dfm

I had an error installing Eureka! I've opened a new issue here: https://github.com/kevin218/Eureka/issues/421

catrionamurray avatar Jul 25 '22 16:07 catrionamurray

A question about the conflict of interest policy: I haven't directly worked with any of the authors on the Eureka! paper but I am a coauthor on two papers with Adina Feinstein (and potentially a few more people in the author list). I wasn't sure if this is an issue?

Thanks for bringing this up, @catrionamurray! I believe that this potential conflict is sufficiently minor that, as long as you're confident that you can provide a fair review, I'm happy to waive this apparent conflict of interest. If anyone involved feels that this in inappropriate, please feel free to comment here, or contact me directly via email (thanks @kevin218 for your confirmation already!). Otherwise, I think we should proceed with the review as planned. Thanks again!!

I'm happy I can provide a fair review!

catrionamurray avatar Jul 25 '22 16:07 catrionamurray

I found a potential mistake in the Quickstart docs, opened new issue here: https://github.com/kevin218/Eureka/issues/422

catrionamurray avatar Jul 25 '22 17:07 catrionamurray

There are missing references in the text to JWST, HST etc., issue here: https://github.com/kevin218/Eureka/issues/423

catrionamurray avatar Jul 25 '22 17:07 catrionamurray

Second issue with installation: https://github.com/kevin218/Eureka/issues/415

catrionamurray avatar Jul 28 '22 20:07 catrionamurray

Hi all, my review is on-going as I have been having installation struggles with this tool. I'll keep working on it, but for now I've got this feedback:

Installation

  • I have had some installation issues. Given that some people don’t use conda, I tried to use a different environment/package manager. I had some of the same troubles people seem to be having installing this tool on an M1 Mac. I believe there are some M1 specific installation instructions that are going live on the docs, though these are to install via conda. I'm figuring out ways to get this to install.

  • The installation instructions point to a github repo that seems to be being updated at the moment with bug fixes and new functionality. Can authors highlight what version is their stable version for the purposes of this review?

Documentation

  • Key feedback: The documentation (paper and repo documentation) needs to make more obvious clear a) some of these functions are wrappers to the jwst package, b) that Eureka has alternatives for jwst package routines (e.g. ramp fitting) and c) why/when users should choose one or the other. A good place to do this might be a docs page explaining “what is Eureka”, which would explain what the package is and what users can expect to achieve with it, with some discussion on these points.

  • There are API docs, but some of the API documentation is missing, for example:

image

Several of the docstrings look like this, and so it’s not clear what these functions do. If these functions are not part of the stable release for this review, please can authors discuss here?

  • The API docs could benefit from some description of a demo workflow. As it stands, because the workflow is behind some scripts like run_eureka.py or functions e.g. calibrateJWST it’s not immediately clear to the reader what is happening under the hood, or what functions are being called in what order.

  • Inside the documentation there’s not a discussion of the methods behind the functions (which is called out in the JOSS checklist). In the quickstart documentation there is a list of what is happening (e.g. ramp fitting) but not a detailed description of the methods/assumptions made in each step. It may be that this is not where the authors intend to put this information, in which case could they point me towards the right documentation about the methods?

Example Usage

  • There is an example usage in the docs online, and example config files in the ‘demosfolder. It could be made more clear how to use this example. Thedemos` folder would benefit from a readme to explain how to use this example.

  • Authors might want to include the schematic from the paper inside their docs.

Tests

  • There are tests, and they are run automatically with CI. Looking at the tests, they are designed to import the relevant modules, run scripts on test data, and check that the resultant files are produced and that figures are produced. These tests do check that the tools run and produce an output, but don’t seem to check that the output is valid or correct. This does meet a minimum test that files are produced by the routines, but it does not thoroughly test the routines.

High Level Feedback

The Eureka tool works based on config files, and has multiple "stages" that are imported. The tool is run with scripts, as a pipeline. Because of this, the API can be quite hard to read to understand what’s going on, and for the user the tool can seem to be a bit of a black box, for example this is a workflow in the tests:

s2_meta = s2.calibrateJWST(meta.eventlabel, ecf_path=ecf_path)
s3_spec, s3_meta = s3.reduce(meta.eventlabel, ecf_path=ecf_path,
                                 s2_meta=s2_meta)
s4_spec, s4_lc, s4_meta = s4.genlc(meta.eventlabel, ecf_path=ecf_path,
                                       s3_meta=s3_meta)
s5.fitlc(meta.eventlabel, ecf_path=ecf_path, s4_meta=s4_meta)

It's not easily understandable to the reader what's happening in these steps. This is a common workflow in astronomy, and I am not advocating that you change any of these design decisions. However I wanted to highlight this, because I believe we need to ensure the reader has some information in the docs about what steps are happening inside these functions, and what methods are being used. This has lead to my feedback above.

I'll keep working on this review, but wanted to give some feedback as I go.

christinahedges avatar Aug 01 '22 17:08 christinahedges

A quick response to a couple of your points - I'll try to address a couple of these today and the rest throughout the week.

  • I have had some installation issues. Given that some people don’t use conda, I tried to use a different environment/package manager. I had some of the same troubles people seem to be having installing this tool on an M1 Mac. I believe there are some M1 specific installation instructions that are going live on the docs, though these are to install via conda. I'm figuring out ways to get this to install.

Yeah, unfortunately none of our development team has an M1 Mac (Linux and Intel Macs only), so there's nothing we can do personally to test on M1 chips and we're left to relying on user-submitted issues which we've tried to resolve quickly. I think I and others avoided M1 chips for the time being knowing that a lot of software and packages would have issues with the new processor for a year or few. At this point, there are no known issues with our code itself on M1, but a couple of our dependencies have issues with M1 and there's little we can do ourselves about that. For several of the known installation issues, the problems can be resolved using the conda installation methods as some of the packages do seem to have been built with M1 knowledge on conda but not on pip.

  • The installation instructions point to a github repo that seems to be being updated at the moment with bug fixes and new functionality. Can authors highlight what version is their stable version for the purposes of this review?

Yeah, the code has been under pretty rapid development with the release of the first JWST observations. Things are stabilizing more with time though, so let's use the latest release from yesterday (v0.4; https://github.com/kevin218/Eureka/tree/v0.4; https://eurekadocs.readthedocs.io/en/v0.4/). The joss branch is now up-to-date with that v0.4 but also has the current paper draft which is not in the main branch. Note that changes to the paper draft to address your comments are only going to be present in the joss branch (which editorialbot uses here to build the paper), and for the docs/code issues you've raised the changes will be made on the main branch (https://github.com/kevin218/Eureka; https://eurekadocs.readthedocs.io/en/latest/) and occasionally released as stable versions.

  • There are API docs, but some of the API documentation is missing

Ah yeah, several of these are functions that could potentially be used some day for WFC3 analyses but are not used and aren't likely to be used anytime soon, so I'll just remove them from the repo shortly.

  • The API docs could benefit from some description of a demo workflow. As it stands, because the workflow is behind some scripts like run_eureka.py or functions e.g. calibrateJWST it’s not immediately clear to the reader what is happening under the hood, or what functions are being called in what order.

Agreed, we've discussed making flowcharts for each stage (https://github.com/kevin218/Eureka/issues/113), but we didn't want to start on that until we'd run some real JWST data through the pipeline as we knew things were subject to significant change. Now that things are settling down, making such flowcharts would certainly be beneficial.

  • Inside the documentation there’s not a discussion of the methods behind the functions (which is called out in the JOSS checklist). In the quickstart documentation there is a list of what is happening (e.g. ramp fitting) but not a detailed description of the methods/assumptions made in each step. It may be that this is not where the authors intend to put this information, in which case could they point me towards the right documentation about the methods?

This will take some more thinking on our end - I will get back to you. The flowcharts will help some, but not completely.

  • There is an example usage in the docs online, and example config files in the demos folder. It could be made more clear how to use this example. Thedemos folder would benefit from a readme to explain how to use this example.

Would a simple README.md which points the users to the quickstart tutorial be adequate for what you're thinking?

  • Authors might want to include the schematic from the paper inside their docs.

Ah, indeed - I guess we forgot to include that in the docs.

  • There are tests, and they are run automatically with CI. Looking at the tests, they are designed to import the relevant modules, run scripts on test data, and check that the resultant files are produced and that figures are produced. These tests do check that the tools run and produce an output, but don’t seem to check that the output is valid or correct. This does meet a minimum test that files are produced by the routines, but it does not thoroughly test the routines.

Agreed, we've discussed this too (in issue https://github.com/kevin218/Eureka/issues/140 and during video calls) - the hard part is spoofing data for each of the functions for which we know the expected result. We're unlikely to write such tests for Stage 2 (as there's basically no code in there) and writing them for Stage 1 would be very challenging (due to the large file sizes of the input uncal files for even ~3 integrations), but the functions in Stages 3-6 are good candidates for detailed unit testing. As it stands, the already existing tests have been very helpful as they test end-to-end and the code typically crashes somewhere along the way if something is off, but I totally agree that a lack of crashes does not imply the correct result has been produced.

... Because of this, the API can be quite hard to read to understand what’s going on, and for the user the tool can seem to be a bit of a black box ...

Agreed, and hopefully this will be significantly ameliorated with the flowcharts I described above. We've also talked about making it easier for the user to call each of the steps in the stages themselves to give a better understanding of the flow and procedure (and make customization easier)

taylorbell57 avatar Aug 01 '22 18:08 taylorbell57

Following up on the above, I just made a PR (https://github.com/kevin218/Eureka/pull/442) which adds to the documentation website some more flowcharts which describe the workflow for Eureka!'s Stages 3 and 4 (visible here for now https://eurekadocs.readthedocs.io/en/joss_docs/stages.html). I didn't bother to make flowcharts for Stages 5 and 6 as they're not that complicated or deeply nested. I added links to the documentation for Stages 1 and 2 made by the STScI as any edits we currently allow for Stage 1 are fully experimental.

That PR should also resolve the issues you pointed out about undocumented or poorly documented functions in the API.

Still unaddressed are comments:

  1. Add unit tests rather than just relying on end-to-end tests
  2. The demos folder would benefit from a readme to explain how to use this example. a. Please clarify if a README file with a link to the quickstart guide is all that is needed here
  3. Inside the documentation there’s not a discussion of the methods behind the functions a. Can you comment on how detailed you are thinking here? With so many functions, this could pretty quickly get out of hand.

Let me know if you feel that any of your other comments have not been addressed now (either with edits in https://github.com/kevin218/Eureka/pull/442 or in the discussion above).

taylorbell57 avatar Aug 03 '22 00:08 taylorbell57

Hi @taylorbell57

The new flow charts look good!

Please clarify if a README file with a link to the quickstart guide is all that is needed here

Yes a README is what's needed, with a brief explanation of what the demos are and a pointer to the quickstart sounds fine.

Can you comment on how detailed you are thinking here? With so many functions, this could pretty quickly get out of hand.

I think it would be worth having @dfm weigh in here. Currently, as a user, I think there's not a place that I can go to find the methods this pipeline is using under the hood. @dfm, to meet the documentation of methods here, what should the authors be aiming for?

christinahedges avatar Aug 03 '22 18:08 christinahedges

I think that overall the state of the docs looks good, but I think your point is a good one @christinahedges. In many cases the docstrings do leave something to be desired in terms of narrative discussion of what the method is doing in. I wouldn't consider this a showstopper for the JOSS review, but I also think that the user experience could be substantially improved with some expansion of the docstrings for the key user-facing methods, or more narrative structure in the high-level summary pages.

In summary: from my perspective the minimum requirements are certainly satisfied, but I think that some effort in adding some detail to some key subset of docstrings would go a long way!

dfm avatar Aug 03 '22 20:08 dfm

@catrionamurray, @christinahedges — I wanted to check in here on the status of your reviews. Please let us know where things stand, and what issues remain wrt your checklists! Thanks!!

dfm avatar Sep 07 '22 15:09 dfm

I'm on holiday at the moment but I will post an update on Monday when I'm back!

catrionamurray avatar Sep 07 '22 19:09 catrionamurray

@editorialbot generate pdf

catrionamurray avatar Sep 19 '22 20:09 catrionamurray