joss-reviews
joss-reviews copied to clipboard
[PRE REVIEW]: strucscan: A lightweight Python-based framework for high-throughput material simulation
Submitting author: @thohamm (Thomas Hammerschmidt) Repository: https://github.com/ICAMS/strucscan Branch with paper.md (empty if default branch): Version: 1.0 Editor: @ppxasjsm Reviewers: Pending Managing EiC: Kevin M. Moerman
Status
Status badge code:
HTML: <a href="https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1"><img src="https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1/status.svg"></a>
Markdown: [](https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1)
Author instructions
Thanks for submitting your paper to JOSS @thohamm. Currently, there isn't an JOSS editor assigned to your paper.
@thohamm if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).
Editor instructions
The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:
@editorialbot commands
Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.08 s (967.0 files/s, 123169.0 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
Python 25 902 1981 3338
Markdown 13 95 0 403
TeX 1 13 0 137
Jupyter Notebook 4 0 1763 121
reStructuredText 8 51 81 80
YAML 8 20 135 68
Bourne Again Shell 14 25 137 57
make 1 4 6 9
--------------------------------------------------------------------------------
SUM: 74 1110 4103 4213
--------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1016/0927-0256(96)00008-0 is OK
- 10.1103/PhysRevB.59.1758 is OK
- 10.1103/physrevb.54.11169 is OK
- 10.1016/j.cpc.2021.108171 is OK
- 10.1016/j.commatsci.2021.110731 is OK
- 10.1016/j.commatsci.2018.07.043 is OK
- 10.1016/j.commatsci.2017.07.030 is OK
- 10.1002/cpe.3505 is OK
- 10.1109/CCGRID.2001.923173 is OK
- 10.1088/1361-648x/aa680e is OK
- 10.1088/1367-2630/15/11/115016 is OK
- 10.1038/s41597-020-00638-4 is OK
MISSING DOIs
- 10.1007/10968987_3 may be a valid DOI for title: SLURM: Simple Linux Utility for Resource Management
INVALID DOIs
- doi.org/10.1016/j.commatsci.2012.10.028 is INVALID because of 'doi.org/' prefix
Wordcount for paper.md
is 1447
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Missing and invalid DOI fixed.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1016/0927-0256(96)00008-0 is OK
- 10.1103/PhysRevB.59.1758 is OK
- 10.1103/physrevb.54.11169 is OK
- 10.1016/j.commatsci.2021.110731 is OK
- 10.1016/j.commatsci.2018.07.043 is OK
- 10.1016/j.commatsci.2017.07.030 is OK
- 10.1002/cpe.3505 is OK
- 10.1109/CCGRID.2001.923173 is OK
- 10.1007/10968987_3 is OK
- 10.1088/1361-648x/aa680e is OK
- 10.1088/1367-2630/15/11/115016 is OK
- 10.1038/s41597-020-00638-4 is OK
- 10.1016/j.commatsci.2012.10.028 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@thohamm I have had a quick look at your paper, can you check the following:
- [ ] Please add city and country to your affiliation, do not use acronyms for countries.
@editorialbot invite @jedbrown as editor
Invitation to edit this submission sent!
Affiliation fixed.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
:wave: Hi @thohamm. A few notes from my initial scan
- I see there are some "TODO" marks in the document.
- Can you please share with us a little about the provenance of this project? I see it was a new project importing source from elsewhere just prior to submission.
- It looks like there is intended to be Sphinx documentation, but I don't see it linked. Does that exist?
- Those docs say that pytest is used, but it looks like the code doesn't use pytest.
- Are there unit tests and can they be made to run with continuous integration (such as GitHub Actions)?
- If the output is tabular, why not put it into a Pandas dataframe for easy plotting and statistics?
- How does this package related to generic workflow management tools with support for batch computing (e.g., Snakemake, Parsl, Swift, libensemble)?
- To what extent does this package provide machine- or human-auditable provenance?
- To what extent does this package convey research, versus serve as a client to research software (VASP, etc.)? Would it be in scope for strucscan to compute summary statistics, make common figures, and/or do quality control to make the research more efficient or reliable?
👋 @thohamm - we do need a response from you - if we don't hear back in another week, we'll mark this submission as withdrawn.
Thank you for your comments. We are working on them.
@danielskatz, @jedbrown - Thank you for your detailed comments.
We reply to them point by point below. We would be willing to include according remarks in the paper if you find this helpful.
- Thank you for your comment, we fixed them.
- The development of this python code started about 6 months ago on the basis of 10-year old collection of shell scripts. After several iterations of restructuring program flow and data handling we now arrived at a converged version that is also already in practical use in our group. We didn't see much benefit in sharing older versions and therefore decided to set up a new git repo for this project.
- The Sphinx documentation has now been prepared and is available on https://strucscan.readthedocs.io/.
- We prepared tests using pytest but are still struggling with executing them due to write permission problems with GitHub Actions. For the moment we removed the corresponding remark from the documentation in the git repo and will include it again as soon as the tests are up and running.
- Some of the examples are planned to be used as unit tests as soon as the issues with github actions are resolved. (cf. 4)
- The data structure and connectivity of the output depends on the particular set of calculations and is therefore not very well suited for a pandas dataframe with an a-priori layout. We are instead storing dictionaries from which the user can collect data selectively for constructing dedicated dataframes. This is more convenient and flexible in our experience.
- In contrast to generic tools for batch computing, strucscan is dedicated to high-throughput simulations and provides a more sophisticated framework than a generic workflow manager. In particular, it offers generalized interfaces to different simulation software packages, interfaces to schedulers including monitoring tools, a transparent storage structure for convenient post-processing, and measures for data connectivity.
- The provenance of the strucscan code itself is given by the github versioning that starts at the first converged version of the software (cf. 2). The provenance of individual calculations with strucscan is realized by storing all input and output files in a folder tree. This storage is in addition to the collection of central results in dictionaries. This approach of storing the full data set as provenance of individual calculations provides the user full flexibility of adapting to external meta-data schema, e.g. of third-party databases.
- The goal of strucscan is to support the user to convey high-throughput research. This means particularly to cope with the challenges of starting, monitoring, collecting and analyzing very large numbers (thousands, ten-thousands) of calculations. The goal of strucscan is to provide the basis for exactly the mentioned purposes, i.e., summary statistics, common figures and quality control. This means a framework for generating large sets of quality-checked data stored in a transparent data structure. On this ground the user can then use the basic post-processing of strucscan or further tools of data visualization and analytics.
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@jedbrown – do these comments satisfy your concerns, and if so, would you be able to edit this submission for JOSS?
@jedbrown :wave: :point_up:
@jedbrown Can you check those comments?
I'm sorry to be slow catching up after our family got covid (via 2yo's childcare). @thohamm, thanks for your updates and comprehensive response. I think we still need a scope check with other editors in light of it being nominally general purpose (thus user is responsible for app-specific reliability checks/statistics/plotting), but the examples and known uses thus far are for VASP. The main question is to what extent people outside your research group would use this tool for their research or would disseminate research via this tool (say, as a platform for CS research).
Regarding packaging, it looks like you're missing pyyaml
. This in a fresh virtualenv:
$ pip install .
Processing /home/jed/joss/strucscan
Preparing metadata (setup.py) ... done
[...]
Successfully installed strucscan-0.post0.dev68
$ strucscan --help
Traceback (most recent call last):
File "/home/jed/joss/strucscan/VENV/bin/strucscan", line 5, in <module>
from strucscan.cli import main
File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/__init__.py", line 1, in <module>
from strucscan.__init__ import *
File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/__init__.py", line 7, in <module>
from strucscan.utils import *
File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/utils.py", line 2, in <module>
import yaml
ModuleNotFoundError: No module named 'yaml'
Installing pyyaml
fixes this, but strucscan --help
still doesn't work. (I know it doesn't claim to, but this is a normal expectation.)
@editorialbot query scope
Submission flagged for editorial review.
@editorialbot assign me as editor
Assigned! @ppxasjsm is now the editor