joss-reviews [REVIEW]: AutoRA: Automated Research Assistant for Closed-Loop Empirical Research

Submitting author: @musslick (Sebastian Musslick) Repository: https://github.com/AutoResearch/autora-paper Branch with paper.md (empty if default branch): main Version: v4.0.0 Editor: @jbytecode Reviewers: @seandamiandevine, @szorowi1 Archive: Pending

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/be6d470033fbe5bd705a49858eb4e21e"><img src="https://joss.theoj.org/papers/be6d470033fbe5bd705a49858eb4e21e/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/be6d470033fbe5bd705a49858eb4e21e/status.svg)](https://joss.theoj.org/papers/be6d470033fbe5bd705a49858eb4e21e)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@seandamiandevine & @szorowi1, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @seandamiandevine

📝 Checklist for @szorowi1

Jun 04 '24 19:06 editorialbot

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

Jun 04 '24 19:06 editorialbot

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.31222/osf.io/ysv2u is OK
- 10.1016/j.jbef.2017.12.004 is OK
- 10.48550/arXiv.1912.04871 is OK
- 10.48550/arXiv.2006.11287 is OK
- 10.1126/sciadv.aav6971 is OK
- 10.31234/osf.io/c2ytb is OK

MISSING DOIs

- No DOI given, and none found for title: Bayesian machine scientist for model discovery in ...
- No DOI given, and none found for title: An evaluation of experimental sampling strategies ...
- No DOI given, and none found for title: Scikit-learn: Machine learning in python
- No DOI given, and none found for title: A Unified Framework for Deep Symbolic Regression

INVALID DOIs

- None

Jun 04 '24 19:06 editorialbot

Software report:

github.com/AlDanial/cloc v 1.90  T=0.01 s (547.6 files/s, 36692.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Markdown                         2             33              0            100
TeX                              1             14              0             90
YAML                             1              1              5             25
-------------------------------------------------------------------------------
SUM:                             4             48              5            215
-------------------------------------------------------------------------------

Commit count by author:

    11	Sebastian Musslick
     3	musslick
     2	Younes Strittmatter

Jun 04 '24 19:06 editorialbot

Paper file info:

📄 Wordcount for paper.md is 1549

✅ The paper includes a Statement of need section

Jun 04 '24 19:06 editorialbot

License info:

✅ License found: MIT License (Valid open source OSI approved license)

Jun 04 '24 19:06 editorialbot

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Jun 04 '24 19:06 editorialbot

@seandamiandevine, @szorowi1 - Dear reviewers, you can start with creating your task lists. In that list, there are several tasks.

Whenever you perform a task, you can check on the corresponding checkbox. Since the review process of JOSS is interactive, you can always interact with the author, the other reviewers, and the editor during the process. You can open issues and pull requests in the target repo. Please mention the url of this page in there so we can keep tracking what is going on out of our world.

Please create your tasklist by typing

@editorialbot generate my checklist

Thank you in advance.

Jun 04 '24 19:06 jbytecode

@editorialbot remind @szorowi1 in two weeks

Jun 04 '24 19:06 jbytecode

Reminder set for @szorowi1 in two weeks

Jun 04 '24 19:06 editorialbot

Review checklist for @seandamiandevine

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/AutoResearch/autora-paper?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@musslick) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Jun 06 '24 17:06 seandamiandevine

Review checklist for @szorowi1

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/AutoResearch/autora-paper?
[x] License: Does the repository contain a plain-text LICENSE or COPYING file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@musslick) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Jun 12 '24 16:06 szorowi1

:wave: @szorowi1, please update us on how your review is going (this is an automated reminder).

Jun 18 '24 19:06 editorialbot

Hi @jbytecode, hope you've been well! I'm working my way through the review. I was wondering if I could request some guidance for establishing functionality. The AutoRA library is quite extensive, distributed across 30+ python packages (though some are quite small, composed of only a few functions/classes). What would you consider to be sufficient for demonstrating functionality (e.g., working through the tutorials/examples in the docs, applying the software to a novel personal use case, etc.)? Thank you!

Jun 19 '24 17:06 szorowi1

@musslick - Could you please provide guidelines and help our reviewer on the issue mentioned above?

@szorowi1 - Any critics/suggestions/corrections/thoughts are welcome. Following the checklist items are generally enough.

Jun 20 '24 10:06 jbytecode

@jbytecode Sure thing!

@szorowi1 (also tagging @seandamiandevine ) Thanks for checking about this. We discussed with the development team what might be a good functionality test for AutoRA. We reached consensus that the Tutorials and the two examples Equation Discovery and Online Closed-Loop Discovery would capture most of the core functionality of AutoRA. So evaluating those might be most appropriate for a functionality test. Note that all of the tutorials (except the online closed-loop discovery) should be executable via Google Colab. Please let us know if you run into any issues or have any other questions---and thanks for all your work!

Jun 21 '24 10:06 musslick

Thanks @musslick for the direction. It was very helpful in guiding functionality tests.

Checklist-related comments

Installation worked well on my local machine. I tested it with pip and conda, both of which executed without issue and key functionality was maintained. Unsurprisingly, functionality worked well within Google Collab.
The tutorials are clear and all functionality works well on my end.
The section on using Prolific for the Online Closed-Loop Discovery example was not yet completed for me. I would have liked to test this, as I am more familiar with Prolific testing than Firebase. Though I was able to follow the instructions for creating a test environment in Firebase without any problems. If the authors can update the section on Prolific integration, I'd be happy to test it out.
I encountered errors running the Equation Discovery tutorial on Google Collab. In step 6 of the Polynomial Regressor section, I was unable to run the present_results() function, as conditions_test and observations_test were not the same size: ValueError: x and y must be the same size. This occurred for the 2d and 3d visualizations. A similar error occurred in the Bayesian Machine Scientist section, making it so I was unable to fit the example bms model. Since this example notebook has a lot of useful tools for navigating and using autoRA, the authors may want to patch these errors up for new users.

General comments

The tutorials and documentation made running synthetic experiments simple and clear. However, after playing with autoRA on-and-off for about 2 weeks, I still find myself a little unclear on how I could use some of these functions without knowing the data generating ("ground truth") function. In other words, I'm still a little unsure how I could use autoRA to 1) collect new empirical data (see Prolific comment above), 2) generate candidate models given empirical experimental data, and 3) how to iteratively update my model predictions with new data through autoRA. A whole new tutorial on this may not be feasible at this stage, but perhaps a toy example (with Prolific?) could be a useful illustration for new users (like me!).
This is a really cool project that feels like it could be very useful to researchers as it develops. Thanks for thinking of me!

Jun 27 '24 18:06 seandamiandevine

@musslick - Could you please update your status and inform us how is your study going? Do we have any improvements in light of our reviewer's suggestions?

Jul 03 '24 15:07 jbytecode

Apologies to all for the delay, it's been a hectic few weeks!

Let me start by saying congrats to @musslick and co-authors/collaborators! This is a really impressive framework and it's obvious how much careful attention, thought, and effort went into developing it. Kudos!

I've now had a chance to work through the documentation, tutorials, and examples. The installation went fine, the code works as expected, and the Docs/API are robust. To echo @seandamiandevine, I also ran into a number of errors when running through the Equation Discovery tutorial in Collab having to do with data shape mismatches. When running the Experimentalist.ipynb tutorial notebook in Collab, I also ran into the following error early on:

ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-2-508cdcdb2e51>](https://localhost:8080/#) in <cell line: 4>()
      2 from sklearn.linear_model import LinearRegression
      3 from autora.variable import DV, IV, ValueType, VariableCollection
----> 4 from autora.experimentalist.sampler.falsification import falsification_sample, falsification_score_sample

ModuleNotFoundError: No module named 'autora.experimentalist.sampler'

I agree with @seandamiandevine it would be good to make sure the Example notebooks run all the way through for new users.

Some general thoughts, none of which should necessarily preclude acceptance or publication:

Depending on who the intended "core audience" is, I think more elaborate examples might be helpful. If one of the main intended audiences are cognitive scientists/psychologists, then an additional example might be using AutoRA to identify experiments best suited for discriminating between competing cognitive models (e.g., as in Cavagnaro et al. 2013 or Cavagnaro et al. 2016). [As an aside, it would be interesting to hear how you think AutoRA relates to adaptive design optimization more broadly.]
Relatedly, one experimentalist type not yet covered (I don't think?) is choosing trials to minimize posterior uncertainty over model parameters (e.g., Ahn et al. 2020).
Final thought: Oh what I would have given to have had something like this during my PhD 🙂

Jul 03 '24 22:07 szorowi1

Dear @seandamiandevine and @szorowi1,

Thank you both so much for investing the time and effort into this review and for providing such thorough and constructive feedback. We really appreciate that!

I discussed your feedback with the team, and we agree that there is not sufficient (and complete) information about how to utilize the closed-loop functionality of AutoRA for real-world experiments. Adding respective examples in the documentation would be beneficial, especially for researchers interested in behavioral experiments.

We propose to do the following:

Fix the errors in the Equation Discovery tutorial and the Experimentalist.ipynb notebook.
Include the following two end-to-end examples for closed-loop experimentation with AutoRA (using a both Prolific, and Firebase):

2.1 Mathematical model discovery for a psychophysics experiment

2.2 Computational (reinforcement learning) model discovery for a one-armed bandit experiment

Once we implemented and internally vetted those tutorials we would love to get your feedback on those. That said, we would also understand if you've had enough of AutoRA already and/or don't have the time ;)

As a quick fix, we already expanded the Closed-Loop Online Experiment Example to include a description of how to combine AutoRA with Prolific (to address @seandamiandevine' initial point).

In addition, to follow-up on the general thoughts from @szorowi1's, we would aim to include two additional examples for closed-loop experimentation (also using Prolific and Firebase). We may not be able to get them implemented over the course of the review process but wanted to hear your thoughts on whether these could be a useful target for our next development milestone:

2.3 Drift diffusion model comparison for a random-object-kinematogram (RDK) experiment using Bayesian optimal experimental design (specifically minimizing posterior uncertainty)

2.4 Experiment parameter tuning for a task-switching experiment (to illustrate how AutoRA can be used to for automated design optimization, e.g., to enhance a desired behavioral effect, such as task switch costs)

Finally, to address @szorowi1's question: We think that AutoRA could be used for design optimization (we could illustrate this in Example 2.4). However, it's not (yet) capable of adapting the experiment on the fly, i.e., within a single experiment session. Rather, it can help optimize design after collecting data from a set of experiments, and then proposing a new set of experiments.

Please let us know what you think!

Jul 05 '24 18:07 musslick

I think the plan above sounds great! I would definitely be happy to review 2.1/2.2 when they are ready. I also agree that 2.3/2.4 are fantastic tutorial examples but will require more time to develop. So, @musslick, will you let us know when 2.1/2.2 are ready and we can go from there?

Jul 15 '24 11:07 szorowi1

That sounds great, @szorowi1 ! Thank you for being willing to take a second look. We will ping you both once 2.1 and 2.2 are ready for your review!

Jul 16 '24 02:07 musslick

@musslick - May I request an update please? Thank you in advance.

Aug 15 '24 17:08 jbytecode

@jbytecode Thanks for checking in. We created two new tutorials in response to the reviewers but are still in the process of incorporating them into the doc, and doing some last validation checks. We should have them up in two weeks!

Aug 26 '24 17:08 musslick

@musslick - May I request an update please? Sorry if I am bothering. Thank you in advance.

Oct 01 '24 08:10 jbytecode

@musslick - May I request an update please? Sorry if I am bothering. Thank you in advance.

Thanks for checking in. We are aiming to have the new release up by end of next week, and will ping you!

Oct 02 '24 15:10 musslick

@musslick - Thank you for the status update. Good luck with your edits.

Oct 02 '24 17:10 jbytecode

Dear @jbytecode @szorowi1 @seandamiandevine

Thank you so much for your patience with this revision. Your feedback has been incredibly valuable, helping us uncover multiple inconsistencies in the AutoRA documentation and concomitant opportunities for improvement. This led us to address some deeper issues in the core code base, which is why the revision process took a bit longer than anticipated.

Following your suggestions, we ended up re-structuring the tutorials into Basic Tutorials and Use Case Tutorials. The latter demonstrate practical applications of AutoRA in real-world scenarios. The most relevant changes include:

Fixing the Theorist Tutorial, which can now be found in the Basic Tutorials section.
Fixing the Experimentalist Tutorial, which can now also be found in the Basic Tutorials section.
Adding an (end-to-end) Use Case Tutorial for a Closed-Loop Psychophysics Study.
Adding an (end-to-end) Use Case Tutorial for a Closed-Loop One-Armed Bandit Study.

We invite you to have a look at these revised sections.

To ease your review: Running the revised tutorials shouldn't require any local installation. You should be able to execute both Basic Tutorials in Google Colab. In addition, you should be able run both Use Case Tutorials within GitHub Codespaces (we provide guidelines in the respective tutorials).

Thank you again for your your valuable time and effort. We look forward to your feedback!

Oct 20 '24 22:10 musslick

@musslick - Thank you for the update.

@seandamiandevine, @szorowi1 - Dear editors, could you please update your reviews? Thank you in advance.

Nov 05 '24 18:11 jbytecode

@jbytecode @musslick

Thank you for the revisions! After rerunning the tutorial, everything runs well for me. I also double-checked the local install and that also works for me.

The new Use Case examples are excellent and clearly address my initial concerns. They also serve as a good introduction to Firebase for researchers who are new to server-side development (if it can be called that).

Overall, I find that my comments have been addressed and I'm happy to recommend publication in the current form. @jbytecode, please let me know if I need to do anything else at this stage.

Thanks again for thinking of me as a reviewer and congratulations to @musslick and the team for their contribution!

Nov 26 '24 02:11 seandamiandevine

Dear reviewers @seandamiandevine, @szorowi1

You have still unchecked review items in your checklists. Could you please finalize your reviews by checking them on?

Thank you in advance.

@seandamiandevine - Thank you for your review and the recommendation of acceptance.

Nov 26 '24 10:11 jbytecode

joss-reviews joss-reviews copied to clipboard

[REVIEW]: AutoRA: Automated Research Assistant for Closed-Loop Empirical Research

Status

Reviewer instructions & questions

Checklists

Review checklist for @seandamiandevine

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @szorowi1

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Checklist-related comments

General comments

joss-reviews
joss-reviews copied to clipboard