joss-reviews [PRE REVIEW]: strucscan: A lightweight Python-based framework for high-throughput material simulation

Submitting author: @thohamm (Thomas Hammerschmidt) Repository: https://github.com/ICAMS/strucscan Branch with paper.md (empty if default branch): Version: 1.0 Editor: @ppxasjsm Reviewers: Pending Managing EiC: Kevin M. Moerman

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1"><img src="https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1/status.svg)](https://joss.theoj.org/papers/cf152ba42d55db682d1ac29f951bcfe1)

Author instructions

Thanks for submitting your paper to JOSS @thohamm. Currently, there isn't an JOSS editor assigned to your paper.

@thohamm if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands

Jun 27 '22 16:06 editorialbot

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

Jun 27 '22 16:06 editorialbot

Software report:

github.com/AlDanial/cloc v 1.88  T=0.08 s (967.0 files/s, 123169.0 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Python                           25            902           1981           3338
Markdown                         13             95              0            403
TeX                               1             13              0            137
Jupyter Notebook                  4              0           1763            121
reStructuredText                  8             51             81             80
YAML                              8             20            135             68
Bourne Again Shell               14             25            137             57
make                              1              4              6              9
--------------------------------------------------------------------------------
SUM:                             74           1110           4103           4213
--------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

Jun 27 '22 16:06 editorialbot

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/0927-0256(96)00008-0 is OK
- 10.1103/PhysRevB.59.1758 is OK
- 10.1103/physrevb.54.11169 is OK
- 10.1016/j.cpc.2021.108171 is OK
- 10.1016/j.commatsci.2021.110731 is OK
- 10.1016/j.commatsci.2018.07.043 is OK
- 10.1016/j.commatsci.2017.07.030 is OK
- 10.1002/cpe.3505 is OK
- 10.1109/CCGRID.2001.923173 is OK
- 10.1088/1361-648x/aa680e is OK
- 10.1088/1367-2630/15/11/115016 is OK
- 10.1038/s41597-020-00638-4 is OK

MISSING DOIs

- 10.1007/10968987_3 may be a valid DOI for title: SLURM: Simple Linux Utility for Resource Management

INVALID DOIs

- doi.org/10.1016/j.commatsci.2012.10.028 is INVALID because of 'doi.org/' prefix

Jun 27 '22 16:06 editorialbot

Wordcount for paper.md is 1447

Jun 27 '22 16:06 editorialbot

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Jun 27 '22 16:06 editorialbot

Missing and invalid DOI fixed.

Jun 27 '22 17:06 thohamm

@editorialbot generate pdf

Jun 27 '22 18:06 thohamm

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Jun 27 '22 18:06 editorialbot

@editorialbot check references

Jun 29 '22 13:06 Kevin-Mattheus-Moerman

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1016/0927-0256(96)00008-0 is OK
- 10.1103/PhysRevB.59.1758 is OK
- 10.1103/physrevb.54.11169 is OK
- 10.1016/j.commatsci.2021.110731 is OK
- 10.1016/j.commatsci.2018.07.043 is OK
- 10.1016/j.commatsci.2017.07.030 is OK
- 10.1002/cpe.3505 is OK
- 10.1109/CCGRID.2001.923173 is OK
- 10.1007/10968987_3 is OK
- 10.1088/1361-648x/aa680e is OK
- 10.1088/1367-2630/15/11/115016 is OK
- 10.1038/s41597-020-00638-4 is OK
- 10.1016/j.commatsci.2012.10.028 is OK

MISSING DOIs

- None

INVALID DOIs

- None

Jun 29 '22 13:06 editorialbot

@thohamm I have had a quick look at your paper, can you check the following:

[ ] Please add city and country to your affiliation, do not use acronyms for countries.

Jun 29 '22 13:06 Kevin-Mattheus-Moerman

@editorialbot invite @jedbrown as editor

Jun 29 '22 13:06 Kevin-Mattheus-Moerman

Invitation to edit this submission sent!

Jun 29 '22 13:06 editorialbot

Affiliation fixed.

Jun 29 '22 13:06 thohamm

@editorialbot generate pdf

Jun 29 '22 13:06 thohamm

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Jun 29 '22 13:06 editorialbot

:wave: Hi @thohamm. A few notes from my initial scan

I see there are some "TODO" marks in the document.
Can you please share with us a little about the provenance of this project? I see it was a new project importing source from elsewhere just prior to submission.
It looks like there is intended to be Sphinx documentation, but I don't see it linked. Does that exist?
Those docs say that pytest is used, but it looks like the code doesn't use pytest.
Are there unit tests and can they be made to run with continuous integration (such as GitHub Actions)?
If the output is tabular, why not put it into a Pandas dataframe for easy plotting and statistics?
How does this package related to generic workflow management tools with support for batch computing (e.g., Snakemake, Parsl, Swift, libensemble)?
To what extent does this package provide machine- or human-auditable provenance?
To what extent does this package convey research, versus serve as a client to research software (VASP, etc.)? Would it be in scope for strucscan to compute summary statistics, make common figures, and/or do quality control to make the research more efficient or reliable?

Jun 30 '22 22:06 jedbrown

👋 @thohamm - we do need a response from you - if we don't hear back in another week, we'll mark this submission as withdrawn.

Jul 07 '22 12:07 danielskatz

Thank you for your comments. We are working on them.

Jul 10 '22 15:07 thohamm

@danielskatz, @jedbrown - Thank you for your detailed comments.

We reply to them point by point below. We would be willing to include according remarks in the paper if you find this helpful.

Thank you for your comment, we fixed them.
The development of this python code started about 6 months ago on the basis of 10-year old collection of shell scripts. After several iterations of restructuring program flow and data handling we now arrived at a converged version that is also already in practical use in our group. We didn't see much benefit in sharing older versions and therefore decided to set up a new git repo for this project.
The Sphinx documentation has now been prepared and is available on https://strucscan.readthedocs.io/.
We prepared tests using pytest but are still struggling with executing them due to write permission problems with GitHub Actions. For the moment we removed the corresponding remark from the documentation in the git repo and will include it again as soon as the tests are up and running.
Some of the examples are planned to be used as unit tests as soon as the issues with github actions are resolved. (cf. 4)
The data structure and connectivity of the output depends on the particular set of calculations and is therefore not very well suited for a pandas dataframe with an a-priori layout. We are instead storing dictionaries from which the user can collect data selectively for constructing dedicated dataframes. This is more convenient and flexible in our experience.
In contrast to generic tools for batch computing, strucscan is dedicated to high-throughput simulations and provides a more sophisticated framework than a generic workflow manager. In particular, it offers generalized interfaces to different simulation software packages, interfaces to schedulers including monitoring tools, a transparent storage structure for convenient post-processing, and measures for data connectivity.
The provenance of the strucscan code itself is given by the github versioning that starts at the first converged version of the software (cf. 2). The provenance of individual calculations with strucscan is realized by storing all input and output files in a folder tree. This storage is in addition to the collection of central results in dictionaries. This approach of storing the full data set as provenance of individual calculations provides the user full flexibility of adapting to external meta-data schema, e.g. of third-party databases.
The goal of strucscan is to support the user to convey high-throughput research. This means particularly to cope with the challenges of starting, monitoring, collecting and analyzing very large numbers (thousands, ten-thousands) of calculations. The goal of strucscan is to provide the basis for exactly the mentioned purposes, i.e., summary statistics, common figures and quality control. This means a framework for generating large sets of quality-checked data stored in a transparent data structure. On this ground the user can then use the basic post-processing of strucscan or further tools of data visualization and analytics.

Jul 12 '22 12:07 thohamm

@editorialbot generate pdf

Jul 12 '22 13:07 thohamm

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Jul 12 '22 13:07 editorialbot

@jedbrown – do these comments satisfy your concerns, and if so, would you be able to edit this submission for JOSS?

Jul 18 '22 08:07 arfon

@jedbrown :wave: :point_up:

Jul 29 '22 15:07 Kevin-Mattheus-Moerman

@jedbrown Can you check those comments?

Aug 08 '22 15:08 Kevin-Mattheus-Moerman

I'm sorry to be slow catching up after our family got covid (via 2yo's childcare). @thohamm, thanks for your updates and comprehensive response. I think we still need a scope check with other editors in light of it being nominally general purpose (thus user is responsible for app-specific reliability checks/statistics/plotting), but the examples and known uses thus far are for VASP. The main question is to what extent people outside your research group would use this tool for their research or would disseminate research via this tool (say, as a platform for CS research).

Regarding packaging, it looks like you're missing pyyaml. This in a fresh virtualenv:

$ pip install .
Processing /home/jed/joss/strucscan
  Preparing metadata (setup.py) ... done
[...]
Successfully installed strucscan-0.post0.dev68
$  strucscan --help
Traceback (most recent call last):
  File "/home/jed/joss/strucscan/VENV/bin/strucscan", line 5, in <module>
    from strucscan.cli import main
  File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/__init__.py", line 1, in <module>
    from strucscan.__init__ import *
  File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/__init__.py", line 7, in <module>
    from strucscan.utils import *
  File "/home/jed/joss/strucscan/VENV/lib/python3.10/site-packages/strucscan/utils.py", line 2, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'

Installing pyyaml fixes this, but strucscan --help still doesn't work. (I know it doesn't claim to, but this is a normal expectation.)

Aug 11 '22 05:08 jedbrown

@editorialbot query scope

Aug 11 '22 05:08 jedbrown

Submission flagged for editorial review.

Aug 11 '22 05:08 editorialbot

@editorialbot assign me as editor

Aug 16 '22 08:08 ppxasjsm

Assigned! @ppxasjsm is now the editor

Aug 16 '22 08:08 editorialbot

joss-reviews joss-reviews copied to clipboard

[PRE REVIEW]: strucscan: A lightweight Python-based framework for high-throughput material simulation

Status

joss-reviews
joss-reviews copied to clipboard