joss-reviews
joss-reviews copied to clipboard
[REVIEW]: pocoMC: A Python package for accelerated Bayesian inference in astronomy and cosmology
Submitting author: @minaskar (Minas Karamanis) Repository: https://github.com/minaskar/pocomc Branch with paper.md (empty if default branch): Version: 0.1.0 Editor: @dfm Reviewers: @kazewong, @marylou-gabrie Archive: Pending
Status
Status badge code:
HTML: <a href="https://joss.theoj.org/papers/e7d10f594f5c8eb682d29dd84aaf71be"><img src="https://joss.theoj.org/papers/e7d10f594f5c8eb682d29dd84aaf71be/status.svg"></a>
Markdown: [](https://joss.theoj.org/papers/e7d10f594f5c8eb682d29dd84aaf71be)
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@kazewong & @marylou-gabrie, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:
@editorialbot generate my checklist
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @dfm know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Checklists
Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.20 s (723.0 files/s, 229491.3 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
JavaScript 12 2426 2523 9374
SVG 6 0 13 9319
HTML 17 1199 251 5692
CSS 10 398 138 3169
PO File 45 1087 0 2444
Python 20 667 1528 2163
reStructuredText 14 286 146 503
Jupyter Notebook 13 0 2499 445
Markdown 4 95 0 311
YAML 4 22 17 132
TeX 1 8 0 66
DOS Batch 1 8 1 26
make 1 4 7 9
-------------------------------------------------------------------------------
SUM: 148 6200 7123 33653
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md is 1254
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- None
MISSING DOIs
- 10.1214/06-ba127 may be a valid DOI for title: Nested sampling for general Bayesian computation
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@kazewong, @marylou-gabrie — This is the review thread for the paper. All of our communications will happen here from now on. Thanks again for agreeing to participate!
Please read the "Reviewer instructions & questions" in the first comment above, and generate your checklists by commenting @editorialbot generate my checklist on this issue ASAP. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4634 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for the review process to be completed within about 4-6 weeks but please try to make a start ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.
Review checklist for @kazewong
Conflict of interest
- [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.
Code of Conduct
- [x] I confirm that I read and will adhere to the JOSS code of conduct.
General checks
- [x] Repository: Is the source code for this software available at the https://github.com/minaskar/pocomc?
- [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
- [x] Contribution and authorship: Has the submitting author (@minaskar) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
- [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
Functionality
- [x] Installation: Does installation proceed as outlined in the documentation?
- [x] Functionality: Have the functional claims of the software been confirmed?
- [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
Documentation
- [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
- [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
- [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
- [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
- [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
- [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Software paper
- [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
- [x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
- [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
- [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
- [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
Review checklist for @marylou-gabrie
Conflict of interest
- [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.
Code of Conduct
- [x] I confirm that I read and will adhere to the JOSS code of conduct.
General checks
- [x] Repository: Is the source code for this software available at the https://github.com/minaskar/pocomc?
- [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
- [x] Contribution and authorship: Has the submitting author (@minaskar) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
- [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
Functionality
- [x] Installation: Does installation proceed as outlined in the documentation?
- [x] Functionality: Have the functional claims of the software been confirmed?
- [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
Documentation
- [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
- [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
- [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
- [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
- [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
- [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Software paper
- [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
- [x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
- [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
- [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
- [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
@kazewong, @marylou-gabrie — Just a quick check in here to keep this on your radar. Let us know if you run into any issues!
A few questions:
-
What does the
scalequantity logged in the results refer to? I did not find an explanation in the documentation. -
Is there a way to access the trained flow (or flows)? This is definitely for advanced usage, but maybe it would be good to have a way to assess the quality of training of the flow model. Then, one could assess whether it is necessary to play with the training/config parameters to speedup the sampling.
-
What is the training loss function for the flow? I might have missed something but I did not find this info, neither in the JOSS submission nor in the accompanying paper.
-
Are the authors aware of typical cases in which the method fails? If yes it would be interesting to add a concise mention to limitations.
Overall comment:
Overall the documentation is well detailed and the effort to automate the sequential scheme is remarkable. The paper gives a concise overview of the method, directed to a wise audience. I do note that the specifics of the algorithms and many related references are instead in the accompanying paper (submitted to a different venue I imagine).
Minor:
- The correlation coefficient threshold changes notation between the "Background"section of the docs (CC) and the "Advanced Guide"(gamma).
Questions/comments related to unchecked items
- [ ] Performance/Summary: See this thread https://github.com/minaskar/pocomc/issues/17#issue-1352355283
- [x] Community Guideline: I don't see any comments for seeking support. Per the checklist, I think the Authors should add some guideline to seeking support.
- [x] State-of-the-field/references: I think there are missing references which @marylou-gabrie have also flagged. For state-of-the-field, specifically related to code, I don't think it is any big issue, but the authors should add a couple references related to previous studies that didn't come with open-source codes. Once the authors added the reference suggested by @marylou-gabrie, I am happy to check both boxes out.
Thank you both for your comments!
@marylou-gabrie
- The
scalevariable refers to the ratio of the Metropolis proposal scale (in latent space) to the optimal proposal scale 2.38/Sqrt(D). It is a metric of the preconditioner's performance (i.e. 1 corresponds to perfect preconditioning). I'll make sure we include an explanation in the docs. - The trained flow can be accessed using the
Flowclass method of the sampler. - The flow loss function is the usual forward KL divergence $D_{KL}[p|q]$.
- Cases in which PMC (and thus
pocoMC) might not be the best choice are discussed in the Discussion section of the accompanying paper and include high-dimensional targets and/or cases in which the likelihood evaluation is cheap compared to the flow training cost. Of course, one can construct artificial cases in which the multimodality is so strong that the flow is not expressive enough to capture it. Since the code is intended for astronomical/cosmological research we do not expect any such difficulties.
@kazewong
- Regarding performance please see the response and our questions in the aforementioned thread.
- Brief instructions regarding users seeking support are available in the CONTRIBUTING.md guide and the main page of the docs.
- We are more than happy to add the missing references.
Thanks @minaskar for your helpful answers.
My only remaining question is how do you evaluate the forward KL: using which samples (and potential reweighting) to approximate the expectation over $p$ (where $p$ is the succesively annealed distribution if I am not mistaken)?
Please let us know when the missing references have been added to the paper.
After that, all good for me.
I have added comments in the discussion thread in the code repo. Once that is addressed, I will be happy with the submission.
@minaskar — I wanted to check in here since I think we're waiting on your responses to @kazewong, @marylou-gabrie's final small comments. Let us know if anything isn't clear or if you've addressed these issues. Thanks!
@marylou-gabrie We're currently using the samples from the current annealed distribution in order to train the flow that will serve as the preconditioner for the next annealed distribution. Since the beta-spacings are small enough we found no benefit in reweighting the samples.
@dfm Thanks for the reminder and apologies for the delayed response, we will address the remaining comments during the next few days.
Hi @kazewong and @marylou-gabrie,
I've added the missing references and fixed the minor issues. Let me know if there's anything else missing.
@kazewong, @marylou-gabrie — Can you both take a look at @minaskar's responses to your feedback and let us know if there are any remaining issues? Thanks!!
Thanks Minas and Dan, that's all good for me.
@kazewong — Any updates from your end? Let us know if there are any remaining issues!
(Aside: @marylou-gabrie it looks like your authorship checkbox got unchecked - was that on purpose?)
All green from me
@kazewong — Great! In that case can you update your checklist above? I'm still seeing some boxes unchecked. Thanks!
@dfm all checked.
@kazewong, @marylou-gabrie — Thanks both for your reviews and all of your suggestions for pocoMC!! I really appreciate the time that you took for this process.
@minaskar — I have a few last checks that I need to do (although I probably can't get to them today), then I'll have some final things that I need from you before acceptance. I'll have an update for you in the next day or two. Thanks for your patience!
@editorialbot check references
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- None
MISSING DOIs
- 10.1093/mnras/stac2272 may be a valid DOI for title: Accelerating astronomical and cosmological inference with Preconditioned Monte Carlo
- 10.1214/06-ba127 may be a valid DOI for title: Nested sampling for general Bayesian computation
- 10.1073/pnas.2109420119 may be a valid DOI for title: Adaptive Monte Carlo augmented with normalizing flows
- 10.1093/mnras/staa1469 may be a valid DOI for title: Accelerated Bayesian inference using deep learning
INVALID DOIs
- None
@marylou-gabrie @kazewong, thanks a lot for the reviews, and of course, @dfm for taking care of everything.
Regarding the DOIs, I can verify that those four suggestions are correct. Should I add them to the reference list?
@minaskar don't worry about the DOIs: I'll have some larger edits to the bibliography that will include them!