joss-reviews [REVIEW]: basilisk: a Bioconductor package for managing Python environments

Submitting author: @LTLA (Aaron Lun) Repository: https://github.com/LTLA/basilisk Branch with paper.md (empty if default branch): submission Version: 1.9.11 Editor: @Nikoleta-v3 Reviewers: @jsun, @gtonkinhill Archive: 10.5281/zenodo.7212841

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/cb99a113aa5f8df9e722fed04de91c57"><img src="https://joss.theoj.org/papers/cb99a113aa5f8df9e722fed04de91c57/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/cb99a113aa5f8df9e722fed04de91c57/status.svg)](https://joss.theoj.org/papers/cb99a113aa5f8df9e722fed04de91c57)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@jsun & @gtonkinhill, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Nikoleta-v3 know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @jsun

📝 Checklist for @gtonkinhill

Sep 07 '22 13:09 editorialbot

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

Sep 07 '22 13:09 editorialbot

Software report:

github.com/AlDanial/cloc v 1.88  T=0.04 s (781.6 files/s, 42427.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
R                               22            167            563            556
Markdown                         2             34              0            149
TeX                              1              6              0             47
Rmd                              1             50            126             13
Python                           2              2              0             12
Bourne Shell                     4              4              0              8
-------------------------------------------------------------------------------
SUM:                            32            263            689            785
-------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

Sep 07 '22 13:09 editorialbot

Wordcount for paper.md is 1250

Sep 07 '22 13:09 editorialbot

Failed to discover a valid open source license

Sep 07 '22 13:09 editorialbot

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1101/2022.04.21.488824 is OK

MISSING DOIs

- Errored finding suggestions for "Harry Potter and the Chamber of Secrets", please try later
- Errored finding suggestions for "Conda", please try later
- Errored finding suggestions for "reticulate: Interface to ’Python’", please try later
- Errored finding suggestions for "snifter: R wrapper for the python openTSNE library", please try later
- Errored finding suggestions for "Restore Python 2 compatibility; deprecate Python 2...", please try later
- Errored finding suggestions for "Change path to Python binary", please try later

INVALID DOIs

- None

Sep 07 '22 13:09 editorialbot

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Sep 07 '22 13:09 editorialbot

👋🏼 @LTLA @jsun @gtonkinhill this is the review thread for the paper. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread.

These checklists contain the JOSS requirements ✅ As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/4742 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@Nikoleta-v3) if you have any questions/concerns. 😄 🙋🏻

Sep 07 '22 13:09 Nikoleta-v3

Review checklist for @jsun

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/LTLA/basilisk?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@LTLA) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Sep 07 '22 21:09 jsun

Review checklist for @gtonkinhill

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/LTLA/basilisk?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@LTLA) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Sep 08 '22 03:09 gtonkinhill

Hi, @LTLA, I finished my review. The manuscript and vignette are well-written, following them I can easily try the package. I installed the package and ran some examples and I experienced no issues when using this package. Although this package is intended for use by developers, general users such as data analysts may also be interested. This is because many biological analyses should be performed alternately in R and Python; of course, sometimes require multiple versions of Python. From the viewpoint of a non-developer user, I would suggest that the author add some easy-to-understand sample codes to the vignette. Here are my suggestions.

(1) Although the author mentioned the following items

... will lazily install conda and the required environments if they are not already present. ... (vignette 3.4)
... It is possible to direct basilisk to use an existing Miniconda or Anaconda instance ... (vignette 4.2)

I suggest the author merged them into one section. For example, add an Installation section to mention the following items.

basilisk can use existing conda or can install conda automatically
some caution for environment variables (e.g., BASILISK_USE_SYSTEM_DIR) that should be known/set before using basilisk main functions.

(2) Most codes in the vignette are examples. Is it possible to add some executable codes into the vignette? For example,

add codes in the manuscript to the vignette. This is because I found that if I prepare input_matrix with random numbers, I can execute the code written in the manuscript.
add an example that just imports sklearn.datasets, loads the iris dataset, and returns it to the R variable.

(3) For the license issue or the personal preference, users may use other Python virtual environments such as pyenv instead of conda. The vignette 4.3 mentioned that basilisk can support other virtual environments. Are pyenv, venv, etc, included? Is it possible to add an example of how to use this function?

Sep 08 '22 04:09 jsun

Thanks @jsun.

(1) Your comment prompted me to restructure the vignette to highlight the different pieces of information for different target audiences. Specifically, I made one section explicitly for package developers, and one section explicitly for end users. Admittedly this isn't close to what you suggested, but hear me out...

The (former) Section 3.4 was intended for package developers, and I didn't want to mention the environment variables at this point, because that would give the impression that package developers were expected to set environment variables... which they shouldn't, because that is the privilege/responsibility of the end user depending on the configuration of their R installation. Similarly, I didn't want to give end users the impression that they had to read through 3.4 and understand the intricacies of basiliskRun() stuff when all they want to do is to tweak some of the directory locations.

So, by marking things up more explicitly, it should hopefully be easier to understand who should do what. I suppose that, technically, I should distinguish between end users and R administrators, e.g., for installations on shared HPCs, though maybe the new vignette is clear enough as it is.

(2) Done. Added a truncated PCA from sklearn.decomposition and some stuff from scipy.stats.

Note that the example in the vignette is written for an end user and is a little different from the example in the manuscript (written for package developers). The set-up is currently a bit more involved; I will try to streamline it in a dedicated user-only function.

(3) Whoops. I think that part of the vignette must have been a legacy from the earliest versions of basilisk, when I allowed users to switch between venv and conda. IIRC it became increasingly difficult to reconcile as - I think - venv didn't allow use of different Python versions, and the Python package versions weren't exactly 1:1 between PyPI and conda. (Also some issues with Windows, but I forget the details.) In the end, I just gave up on the venv support and switched to conda for all systems.

In theory, it would be possible to modify a subset of the basilisk machinery to work for venv, though it would probably require a particularly skilled user to do it, as we'd be dependent on their Python and virtual environments being correctly configured. I don't think I could require client package developers to support this, as there's too many unknowns when the Python version is not pinned. If such users exist, I might consider generalizing basiliskRun() to work with venv.

These changes will land with basilisk 1.9.7, provided it passes through BioC's build system in a few days. Also added the explicit license text to accompany the mention in the DESCRIPTION.

Sep 11 '22 05:09 LTLA

Thanks, @LTLA, sounds good! By the way, I checked the vignette hosted on the Bioconductor website last time; this time I downloaded the vignette source from the github main branch.

For (1), I understood your opinion, as the main targets of basilisk are software developers;
for (2), I can check the sample codes about sklearn, thanks (This is a great package, I'll use this in my research soon);
and for (3), I see, I understood the situation.

By the way, it seems that I cannot build the vignette with my macOS, although I've already installed the latest basilisk and basilisk.utils. But this is not a critical issue, I'll finish my review soon (^^).

Sep 12 '22 01:09 jsun

Hi, @Nikoleta-v3, I finalized my review.

Sep 13 '22 10:09 jsun

Hi @LTLA,

I think this is a really useful package that significantly reduces the pain of combining R and python code. The paper is well written and I was able to successfully install and run the package. I have a few suggestions that are mostly things that I think would be nice to have rather than required changes.

It would be nice to include a function to automatically set up a new package to use basilisk similar to the Rcpp.package.skeleton function.
From the paper and the vignette, it was unclear to me how conda was installed and whether there was an easy method for updating the conda version. In my quick test run, it appeared to be using v4.8.3.
I was also wondering whether it would be possible to use the mamba solver instead of the default conda one. This can really speed things up for environments with a lot of dependencies.
I think it might be nice to include an example of selecting the python version for an environment in the example package and in the vignette. In fact, it might be preferable to suggest that users do this for every environment to ensure reproducibility. When testing the example package in /inst/ my installation defaulted to using python 3.7 which led to version clashes between the python packages used in the example. Conda’s error messages are not particularly informative, so it took me a second to realise that it was the python version causing the issue.
You mention that the conda environments use a large amount of disk space. They can also generate a very large number of files which I have run into issues with when IT departments implement unnecessary restrictions. It would be great if there were a function to automatically clean up environments either by deleting them all or deleting those associated with a particular package.

Sep 16 '22 00:09 gtonkinhill

Thanks @gtonkinhill.

1. An interesting idea. It would probably help anyone trying to build a basilisk client from scratch. I'd need to think about it though; at least a few of the existing clients come from the position of having an existing R package that interfaces with Python, and they only added basilisk integration as part of the BioC review process. In those cases, a more useful function would be to, e.g., basiliskize() an existing package by injecting the relevant bits and pieces into an existing R package. I'll have to see how much demand there is for that, maybe it would make the BioC review process easier.

2. The conda installation happens automatically the first time that basiliskRun() is called in any client package, so the user doesn't have to type anything special to get it - I added a note about this in the manuscript. There are a few exceptions, though:

For a system installation, the conda installation is created during basilisk's own installation
If BASILISK_EXTERNAL_CONDA is set, that it just used directly without basilisk installing its own copy.

I also added a BASILISK_MINICONDA_VERSION environment variable that allows users to specify a different conda version, if they must. There are a few caveats though, which are mentioned in the updated vignette.

3. I suppose it is possible, though I think I would prefer to wait for mamba to become conda's default solver. Apparently this will happen at some point (see here), so it should be fairly robust at that point, and I wouldn't have to take responsibility for any differences in behavior with the default.

4. Added some notes about this to the vignette and documentation.

It's worth noting that, actually, the Python version is explicitly pinned by setupBasiliskEnv to whatever was supplied with the conda installer, if no python= package string is detected in packages=. This is done as a fallback if client package developers forget to pin their Python versions. In your case, I'm guessing you used the BioC release version of basilisk, which pins Python to 3.7; but the inst example on GitHub uses the BioC devel version, which is pinned to 3.8, hence the issues.

5. Added some notes to the vignette about basilisk.utils::clearExternalDir(), which does exactly this.

Sep 18 '22 23:09 LTLA

@editorialbot generate pdf

Sep 18 '22 23:09 LTLA

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Sep 18 '22 23:09 editorialbot

Great, thanks @LTLA. @Nikoleta-v3, I have finished my review and recommend acceptance!

Sep 18 '22 23:09 gtonkinhill

@jsun and @gtonkinhill thank you both very much for your time and for your reviews! 🙏🏻 🎉

@LTLA my apologies for the delay on my part. I am going to look over the reviews and the submission one last time this week and let you know if there are any more issues you have to address 😃

Sep 25 '22 12:09 Nikoleta-v3

@LTLA thank you for your patience! I have no further comments. The software is very useful and well developed, and the manuscript is very well written.

The package is aimed at developers and thus it’s not very beginner friendly, but this is not an issue.

At this point could you please:

[ ] (if necessary) Make a new tagged release of your software, and list the version tag of the archived version here. Or let me know if I should just use the release 1.9.2
[ ] Archive the reviewed software in Zenodo or a similar service (e.g., figshare, an institutional repository)
[ ] Check the archival deposit (e.g., in Zenodo) has the correct metadata. This includes the title (should match the paper title) and author list (make sure the list is correct and people who only made a small fix are not on it). You may also add the authors' ORCID.
[ ] Please list the DOI of the archived version here.

I can then move forward with accepting the submission.

Oct 10 '22 10:10 Nikoleta-v3

Thanks @Nikoleta-v3:

Make a new tagged release of your software, and list the version tag of the archived version here.

Done, 1.9.11 (https://github.com/LTLA/basilisk/releases/tag/1.9.11).

Archive the reviewed software in Zenodo or a similar service (e.g., figshare, an institutional repository)

Done, https://zenodo.org/record/7212841#.Y0uwA9LMLkw

DOI: 10.5281/zenodo.7212841

Oct 16 '22 07:10 LTLA

@editorialbot generate pdf

Oct 17 '22 12:10 Nikoleta-v3

@editorialbot check references

Oct 17 '22 12:10 Nikoleta-v3

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Oct 17 '22 12:10 editorialbot

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1101/2022.04.21.488824 is OK

MISSING DOIs

- None

INVALID DOIs

- None

Oct 17 '22 12:10 editorialbot

@editorialbot set 10.5281/zenodo.7212841 as archive

Oct 17 '22 14:10 Nikoleta-v3

Done! Archive is now 10.5281/zenodo.7212841

Oct 17 '22 14:10 editorialbot

@editorialbot set 1.9.11 as version

Oct 17 '22 14:10 Nikoleta-v3

Done! version is now 1.9.11

Oct 17 '22 14:10 editorialbot

@editorialbot recommend-accept

Oct 17 '22 14:10 Nikoleta-v3

joss-reviews joss-reviews copied to clipboard

[REVIEW]: basilisk: a Bioconductor package for managing Python environments

Status

Reviewer instructions & questions

Checklists

Review checklist for @jsun

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @gtonkinhill

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

joss-reviews
joss-reviews copied to clipboard