software-submission icon indicating copy to clipboard operation
software-submission copied to clipboard

PIVA Submission

Open pudeIko opened this issue 1 year ago • 15 comments

Submitting Author: Wojciech Radoslaw Pudelko (@pudeIko) All current maintainers: Wojciech Radoslaw Pudelko (@pudeIko) Package Name: PIVA One-Line Description of Package: Visualization and analysis toolkit for experimental data from Angle-Resolved Photoemission Spectroscopy (ARPES) Repository Link: https://github.com/pudeIko/piva Version submitted: v2.3.2 EiC: @coatless Editor: @crhea93
Reviewer 1: @jsdodge
Reviewer 2: @eigenbrot
Archive: TBD JOSS DOI: TBD Version accepted: TBD Date accepted (month/day/year): TBD


Code of Conduct & Commitment to Maintain Package

Description

  • Include a brief paragraph describing what your package does:

PIVA (Photoemission Interface for Visualization and Analysis) is a GUI application designed for the interactive and intuitive exploration of large, image-like datasets. While it accommodates the visualization of any multidimensional data, its features are specifically optimized for researchers conducting Angle-Resolved Photoemission Spectroscopy (ARPES) experiments. In addition to numerous image processing tools and the ability to apply technique-specific corrections, PIVA includes an expanding library of functions and methods for detailed fitting and advanced spectral analysis.

Scope

  • Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):

    • [ ] Data retrieval
    • [x] Data extraction
    • [ ] Data processing/munging
    • [ ] Data deposition
    • [ ] Data validation and testing
    • [x] Data visualization[^1]
    • [ ] Workflow automation
    • [ ] Citation management and bibliometrics
    • [ ] Scientific software wrappers
    • [ ] Database interoperability

Domain Specific

  • [ ] Geospatial
  • [ ] Education

Community Partnerships

If your package is associated with an existing community please check below:

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

  • For all submissions, explain how and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):

Data extraction: Within the ARPES community, it is common for each beamline and lab to use their own file formats and conventions, which means one often need a custom script to get everything into a common format. To handle these discrepancies, PIVA comes with a data_loaders module that converts them into a standardized Dataset object. The current version includes specific Dataloader classes implemented for numerous sources and beamlines around the world.

Data visualization: The package enables efficient and intuitive exploration of large, image-like datasets. It includes specialized interactive viewers designed to handle 2D, 3D, and 4D datasets, depending on the experimental mode or conditions under which they were collected.

  • Who is the target audience and what are scientific applications of this package?

Experimental physicists conducting ARPES measurements. The package provides a comprehensive framework addressing most of the experimenter's needs, including data extraction, inspection, validation, and detailed analysis.

  • Are there other Python packages that accomplish the same thing? If so, how does yours differ?

Regarding software tailored for ARPES, two notable packages are ARPES Python Tools and PyARPES. However, they differ significantly from PIVA.

The visualization module in the former is limited to generating static plots and lacks any interactive features.

The latter is focused on post-processing and detailed analysis of the spectra, and is different in the following respects:

  • interactive exploration and browsing through data is either restricted to 2D data, or conducted inside the Jupyter environment, which highly affects efficiency and makes working with multiple datasets simultaneously difficult.
  • Viewers designed for 4D datasets are not implemented.
  • PIVA's data_loader module contains richer library of data loading scripts for different light sources around the world.

Furthermore, PyARPES has not been maintained for several years.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted:

#223 (@SimonMolinsky)

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

  • [x] does not violate the Terms of Service of any service it interacts with.
  • [x] uses an OSI approved license.
  • [x] contains a README with instructions for installing the development version.
  • [x] includes documentation with examples for all functions.
  • [x] contains a tutorial with examples of its essential functions and uses.
  • [x] has a test suite.
  • [x] has continuous integration setup, such as GitHub Actions CircleCI, and/or others.

Publication Options

JOSS Checks
  • [x] The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
  • [x] The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
  • [x] The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
  • [x] The package is deposited in a long-term repository with the DOI: 10.5281/zenodo.14599024

Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

  • [x] Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following by checking the box.

  • [x] I have read the author guide.
  • [x] I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The editor template can be found here.

The review template can be found here.

pudeIko avatar Jan 04 '25 14:01 pudeIko

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.

  • [x] Installation The package can be installed from a community repository such as PyPI (preferred), and/or a community channel on conda (e.g. conda-forge, bioconda).
    • [x] The package imports properly into a standard Python environment import package.
  • [x] Fit The package meets criteria for fit and overlap.
  • [x] Documentation The package has sufficient online documentation to allow us to evaluate package function and scope without installing the package. This includes:
    • [x] User-facing documentation that overviews how to install and start using the package.
    • [x] Short tutorials that help a user understand how to use the package and what it can do for them.
    • [x] API documentation (documentation for your code's functions, classes, methods and attributes): this includes clearly written docstrings with variables defined using a standard docstring format.
  • [x] Core GitHub repository Files
    • [x] README The package has a README.md file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.
    • [x] Contributing File The package has a CONTRIBUTING.md file that details how to install and contribute to the package.
    • [x] Code of Conduct The package has a CODE_OF_CONDUCT.md file.
    • [x] License The package has an OSI approved license. NOTE: We prefer that you have development instructions in your documentation too.
  • [x] Issue Submission Documentation All of the information is filled out in the YAML header of the issue (located at the top of the issue template).
  • [x] Automated tests Package has a testing suite and is tested via a Continuous Integration service.
  • [x] Repository The repository link resolves correctly.
  • [x] Package overlap The package doesn't entirely overlap with the functionality of other packages that have already been submitted to pyOpenSci.
  • [x] Archive (JOSS only, may be post-review): The repository DOI resolves correctly.
  • [ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

  • [x] Initial onboarding survey was filled out We appreciate each maintainer of the package filling out this survey individually. :raised_hands: Thank you authors in advance for setting aside five to ten minutes to do this. It truly helps our organization. :raised_hands:


Editor comments

This is a very well done package, the documentation is nearly perfect with the detailed description of each GUI element and the ability to specify a custom data set. The only issue is the lack of short walk through from data ingestion to modeling using a sample data set.

In addition, it would be highly beneficial to show a fully working custom data loader for a given ARPES file that has a non-standard format. The present example is too sparse.

https://piva.readthedocs.io/en/latest/dataloaders.html#writing-a-custom-dataloader

Please also fix the parameter labels under the API documentation for DataBrowser.add_viewer_to_linked_list()

https://piva.readthedocs.io/en/latest/modules/data_browser.html#data_browser.DataBrowser.add_viewer_to_linked_list

coatless avatar Jan 21 '25 07:01 coatless

Hi @coatless,

Thanks for the great feedback and initial remarks! I've implemented the requested changes:

The only issue is the lack of short walk through from data ingestion to modeling using a sample data set.

I've added a new section to the documentation with an example covering data extraction and basic analysis. I hope this aligns with what you had in mind.

In addition, it would be highly beneficial to show a fully working custom data loader for a given ARPES file that has a non-standard format. The present example is too sparse. https://piva.readthedocs.io/en/latest/dataloaders.html#writing-a-custom-dataloader

I've prepared a simulated data file and updated the CustomDataloader class so users can test proper configuration and have a more robust template for their code.

Please also fix the parameter labels under the API documentation for DataBrowser.add_viewer_to_linked_list() https://piva.readthedocs.io/en/latest/modules/data_browser.html#data_browser.DataBrowser.add_viewer_to_linked_list

Done!

I have a technical question. I've committed all changes to the main branch and bumped the version number. Is this the correct approach, or would it be better to keep the changes in a separate branch during the review?

pudeIko avatar Jan 23 '25 16:01 pudeIko

@pudeIko Thank you for the fast turn around!

That's okay with putting everything into main and tagging a release as this is part of intake. I'll update the starting version number in the ticket to match with the initial EiC intake.

coatless avatar Jan 23 '25 19:01 coatless

Hi @coatless,

Hope you're well! I’m just touching base on the review of the software package. I totally understand how things can get busy—just keen to know how it’s going or if there’s anything you need from my side to keep things rolling.

pudeIko avatar Mar 06 '25 07:03 pudeIko

@pudeIko it's on my end at the moment. We're working on finding a new editor for the submission.

coatless avatar Mar 06 '25 09:03 coatless

@pudeIko Thank you for your patience! We've secured an editor to further move the review along.

We're happy to announce that @crhea93 will be the editor for your submission.

For next step, we'll be working toward getting reviewers assigned. This step is detailed here:

https://www.pyopensci.org/software-peer-review/how-to/author-guide.html#the-review-begins

I'll let @crhea93 introduce himself 🙂

coatless avatar Mar 30 '25 01:03 coatless

@pudeIko Good morning! Just to introduce my self briefly -- I am a Ph.D. scientist at the Dragonfly Telescope working on photometric and spectroscopic analysis. I'm very excited to help you through the process of a successful submission!

As @coatless mentioned, the first step is to procure two reviewers -- I will be working on this diligently this week.

Please let me know if you have any questions or need any assistance throughout this process.

crhea93 avatar Mar 31 '25 12:03 crhea93

Editor response to review:


Editor comments

:wave: Hi @jsdodge and @eigenbrot! Thank you for volunteering to review for pyOpenSci! I look forward to a successful review!

The first thing you will do is fill out this pre-review survey. Once completed, please move on to the review itself following the review template outlined below.

Please fill out our pre-review survey

Before beginning your review, please fill out our pre-review survey. This helps us improve all aspects of our review and better understand our community. No personal data will be shared from this survey - it will only be used in an aggregated format by our Executive Director to improve our processes and programs.

  • [x] reviewer 1 survey completed.
  • [x] reviewer 2 survey completed.

Please let me know when you have completed your survey so that I can check it off here. You can either let me know in a tagged comment below or in a separate message on slack or via email.

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: April 22nd, 2025

Reviewers: jsdodge, eigenbrot Due date: 04/22/2025

crhea93 avatar Apr 02 '25 13:04 crhea93

Hi all! 👋

That's great news, and thank you @crhea93, @jsdodge, and @eigenbrot for joining!

I'm excited to see the ball rolling and look forward to working with you on the review! 🚀

pudeIko avatar Apr 02 '25 16:04 pudeIko

Hi @crhea93 , thanks, I've completed the review.

Nice to meet you, @pudeIko, and @eigenbrot! I look forward to working with you on this review.

jsdodge avatar Apr 02 '25 19:04 jsdodge

@jsdodge @eigenbrot

Fantastic! Thank you two for being so prompt.

The next step is to work your way through the package following the review template. When it is complete, please post your filled in template here as a comment.

crhea93 avatar Apr 02 '25 19:04 crhea93

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • [X] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • [x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • [x] Vignette(s) demonstrating major functionality that runs successfully locally.
  • [x] Function Documentation: for all user-facing functions.
  • [x] Examples for all user-facing functions.
  • [x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • [x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

  • [X] Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • [X] The package name
  • [X] Badges for:
    • [X] Continuous integration and test coverage,
    • [X] Docs building (if you have a documentation website),
    • [X] A repostatus.org badge,
    • [X] Python versions supported,
    • [X] Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • [X] Short description of package goals.
  • [X] Package installation instructions
  • [X] Any additional setup required to use the package (authentication tokens, etc.)
  • [x] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • [x] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • [x] Link to your documentation website.
  • [x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • [ ] Citation information

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

  • [x] Package documentation is clear and easy to find and use.
  • [x] The need for the package is clear
  • [x] All functions have documentation and associated examples for use
  • [x] The package is easy to install

Functionality

  • [x] Installation: Installation succeeds as documented.
  • [x] Functionality: Any functional claims of the software been confirmed.
  • [x] Performance: Any performance claims of the software been confirmed.
  • [x] Automated tests:
    • [x] All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • [x] Tests cover essential functions of the package and a reasonable range of inputs and conditions.
  • [x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • [x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
    • [x] Package supports modern versions of Python and not End of life versions.
    • [x] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)

For packages also submitting to JOSS

  • [x] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • [x] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8–10


Review Comments

Summary

This is a very nice package, @pudelko, congratulations! I don't do ARPES myself but I can see the value that this will have to the community.

I'm very sorry this took me so long. I went through the documentation and took notes on things that I found confusing or didn't work for me, which I am including below. I think most of these are fairly minor.

Overall, the documentation is clear and I was able to follow most sections without difficulty, although I was unable to use the Plot Tool or open JupyterLab from within the application. I think it would be helpful to have one or more examples that show how a typical analysis might proceed, so that a new user can see how the different GUI elements interact.

README
  • This should include a short description that compares Piva to other packages in use by the ARPES community. It can be similar to what you already have in the JOSS paper.
Installation
  • I get the following warning when running piva:

    piva/main.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

Tests
  • Running pytest returned a message that it passed 4 tests and generated 21 warnings, which is different from the message given in the documentation.
Data browser
  • Documentation uses CTRL-O, macOS uses CMD-O or ⌘-O (I don't think this is a big deal, so feel free to ignore it if you think fixing it makes the documentation harder to read).
Data viewers
  • The viewer axes are not labeled, nor are the values for the sliders in the various tabs. Would it be a problem to do this? It might help new users.
  • In the image of the 2D DataViewer shown in the documentation, panel (d) has a different item order than in the actual application. The items are different in the 3D and 4D DataViewers, but their documentation tables point to the list in the 2D DataViewer table.
  • In the 2D DataViewer, selecting "Do it", then "Reset" changes the button name to "Do curvature".
  • The 3D DataViewer documentation images show "Orientate" while the application shows "Orient".
  • Selecting the "open in PIT" button in the 2D DataViewer does not issue a warning, but in the 3D DataViewer panel it does. It opens a box with the warning, "Note: PIT is a third-party package and may not work properly with Python versions above 3.8."
  • The table in the 3D DataViewer documentation does not match the panels shown in the figure.
  • Selecting the "start JL session" from the "File" tab does not start JupyterLab in my installation. It does open a system file browser window, but it's not clear to me what I should select to open from this window. Files with .ipynb extensions are visible but I can not select them for opening.
  • The "File" tab for the 2D DataViewer has an annotation "create experimental logbook" but the table lists "open experimental logbook" in the "Functionality" column (the description says it creates it, though).
Fitters
  • MDC Fitter returns error when saving results:

    Traceback (most recent call last): File "/Users/jsd/Library/CloudStorage/OneDrive- SimonFraserUniversity(1sfu)/projects/pyopensci/piva/piva/ fitters.py", line 1057, in save_fit_results for result in self.fit_results: TypeError: 'NoneType' object is not iterable

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/Users/jsd/Library/CloudStorage/OneDrive- SimonFraserUniversity(1sfu)/projects/pyopensci/piva/piva/ fitters.py", line 1063, in save_fit_results for entry in self.fit_results: TypeError: 'NoneType' object is not iterable

    • It's not obvious where the fit results are stored
Plot tool
  • It is unclear to me how to use the Plot Tool.

jsdodge avatar Apr 19 '25 06:04 jsdodge

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience in README.
  • [x] Installation instructions: for the development version of the package and any non-standard dependencies in README.
  • [x] Vignette(s) demonstrating major functionality that runs successfully locally.
  • [x] Function Documentation: for all user-facing functions.
  • [x] Examples for all user-facing functions.
  • [x] Community guidelines including contribution guidelines in the README or CONTRIBUTING.
  • [x] Metadata including author(s), author e-mail(s), a url, and any other relevant metadata e.g., in a pyproject.toml file or elsewhere.

Readme file requirements The package meets the readme requirements below:

  • [x] Package has a README.md file in the root directory.

The README should include, from top to bottom:

  • [x] The package name
  • [x] Badges for:
    • [x] Continuous integration and test coverage,
    • [x] Docs building (if you have a documentation website),
    • [x] A repostatus.org badge,
    • [x] Python versions supported,
    • [x] Current package version (on PyPI / Conda).

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

  • [x] Short description of package goals.
  • [x] Package installation instructions
  • [x] Any additional setup required to use the package (authentication tokens, etc.)
  • [x] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
    • [x] Brief demonstration of package usage (as it makes sense - links to vignettes could also suffice here if package description is clear)
  • [x] Link to your documentation website.
  • [x] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
  • [ ] Citation information
    • Currently says "TBD"

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

  • [x] Package documentation is clear and easy to find and use.
  • [x] The need for the package is clear
  • [x] All functions have documentation and associated examples for use
  • [x] The package is easy to install

Functionality

  • [x] Installation: Installation succeeds as documented.
  • [x] Functionality: Any functional claims of the software been confirmed.
  • [x] Performance: Any performance claims of the software been confirmed.
  • [x] Automated tests:
    • [x] All tests pass on the reviewer's local machine for the package version submitted by the author. Ideally this should be a tagged version making it easy for reviewers to install.
    • [ ] Tests cover essential functions of the package and a reasonable range of inputs and conditions.
      • See below for a more detailed description
  • [x] Continuous Integration: Has continuous integration setup (We suggest using Github actions but any CI platform is acceptable for review)
  • [x] Packaging guidelines: The package conforms to the pyOpenSci packaging guidelines. A few notable highlights to look at:
    • [x] Package supports modern versions of Python and not End of life versions.
    • [ ] Code format is standard throughout package and follows PEP 8 guidelines (CI tests for linting pass)
      • It appears that the CI linting step is failing, but being ignored

For packages also submitting to JOSS

  • [x] The package has an obvious research application according to JOSS's definition in their submission requirements.

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: With DOIs for all those that have one (e.g. papers, datasets, software).

Final approval (post-review)

  • [ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 7


Review Comments

Summary

The PIVA package is an incredibly well realized toolkit for analyzing multidimensional data, specifically data from ARPES experiments. I don't personally work with ARPES data, but after reading the thorough documentation and playing with PIVA for a few hours I felt I had gained a good understanding of the common data viewing and analysis techniques of that field.

PIVA is undoubtedly a very useful tool for analyzing complex data, and a great addition to the pyOpenSci community. PIVA's representation of, and interaction with, 3D data is so natural and intuitive that I wrote my own toy DataLoadersubclass to view 3D FITS data (a common format in astronomy) and it works great! I love the way binning is represented in the Data Viewer.

The biggest weakness from a user-facing experience is a gap in the documentation regarding the final steps of implementing custom data loaders and GUI plugins. This could be easily remedied with more detailed documentation and examples.

The biggest weakness from a packaging perspective is how the tests are implemented and the test coverage in general.

Specific Comments
Custom Loaders/Plugins
  1. The construction of either a Dataloader subclass or a custom QT widget is generally well described, but how they are imported into PIVA through either DataloaderImporter or PluginImporter is less clear. Examples of these *Importer classes are provided in external files, but I would really like to see a detailed description of the parameters and methods of these classes. They are not mentioned in the otherwise-complete API documentation. A suggestion would be to write abstract bases of these *Importer classes that clearly define the interface.

  2. The fact that a Dataloader subclass must contain a load_data() method is a good indication that Dataloader should probably be an abstract base class.

Test System
  1. Per the documentation, PIVA tests are run by calling the test modules directly, which starts a pytest session. This felt weird to me; I think most people are used to simply running pytest in the project root, which does work for PIVA. Running pytest is the method for testing described in CONTRIBUTING.md.

  2. pytest is not listed in the package requirements table. There is also no "test" pip extra.

  3. Test coverage for the GUI functionality seems high, but the rest of the package could use more tests. In particular, there don't seem to be any tests of the working_procedures module.

  4. Despite the presence of a "tox.ini" file, simply running tox locally fails. (I suspect tox is only used for CI builds, so maybe this doesn't matter)

Use of Older Frameworks

There are few places in the code where older methods and frameworks could be replaced with more modern alternatives that would improve the maintainability and robustness of the package. In particular:

  1. The Dataset class inherits from python's Namespace class, which allows for properties to be accessing with a nice ds.property syntax. I would highly recommend moving to using either python dataclasses or pydantic BaseModels, both of which are specifically designed for this type of interaction. Pydantic's BaseModel also offers a lot of useful type-validation functionality that may be particularly useful because the internals of the Dataset are tightly constrained to how that class is used by other code.

  2. In a few places (e.g., Notebooks created by the "touch" button in the Data Viewer) file paths are concatenated with a + operator. At the very least I would recommend using os.path.join, or (even better) pathlib.Path objects.

Documentation

Overall the documentation is extremely good. My comments are minor:

  1. PIVA's pypi page does not contain a "Documentation" link to the readthedocs site. This is a shame because the documentation is so good!

  2. I think the "What is piva" section would be more impactful if the multiple strengths/features of piva were given their own subheadings.

  3. The "Implemented Dataloaders" table is great, and would be even more great if the individual data loaders were links to the respective class in the API documentation.

  4. On the "Data Ingestion and Analysis" page it would be very useful to have an example dataset (the same one used in the example code) so a user can follow along directly. I was eventually able to run the notebook with one of the "test/data" datasets; perhaps elevating one or more of these datasets to an "examples" folder would make their utility more obvious.

Misc
  • Data Viewer line plots show nothing if NaN values are present. The images still render correctly.
  • astropy is listed as a dependency, but I can't find its usage anywhere
  • Related, the use of astropy.units might be helpful. It's also possible the entire constants module could be replaced with astropy.constants.
  • When Saving data to a pickle (via the "save" button in the Data Viewer) it would be nice to be able to specify the save directory.
  • I found db as the script name to be a little confusing. I kept expecting it to be piva.
  • I was expecting the "touch" and "start JL session" buttons to populate a notebook with correct paths for the dataset loaded into the Data Viewer. Instead it looks like they just copied a generic template to the given directory.
  • It looks like ruff errors are being ignored in CI
  • In tiled window manager, content in windows gets truncated if the window is small enough
  • Any opened Jupyter sessions (created with the "start JL session" button) persist after closing all Piva windows.
  • Installation was painless with multiple methods, even those not described in the documentation. FWIW, PIVA appears to work correctly (and the tests pass) for all python versions between 3.10 and 3.13.
  • When clicking the "open in PIT" button the resulting PIT window displays a terminal with the following error in python 3.11 (but not 3.10):
Cell In[1], line 1
----> 1 get_ipython().run_line_magic('pylab', 'qt')

File ~/micromamba/envs/piva11/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2481, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth)
   2479     kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2480 with self.builtin_trap:
-> 2481     result = fn(*args, **kwargs)
   2483 # The code below prevents the output from being displayed
   2484 # when using magics with decorator @output_can_be_silenced
   2485 # when the last Python token in the expression is a ';'.
   2486 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File ~/micromamba/envs/piva11/lib/python3.11/site-packages/IPython/core/magics/pylab.py:159, in PylabMagics.pylab(self, line)
    155 else:
    156     # invert no-import flag
    157     import_all = not args.no_import_all
--> 159 gui, backend, clobbered = self.shell.enable_pylab(args.gui, import_all=import_all)
    160 self._show_matplotlib_backend(args.gui, backend)
    161 print(
    162     "%pylab is deprecated, use %matplotlib inline and import the required libraries."
    163 )

File ~/micromamba/envs/piva11/lib/python3.11/site-packages/ipykernel/inprocess/ipkernel.py:200, in InProcessInteractiveShell.enable_pylab(self, gui, import_all, welcome_message)
    198 if not gui:
    199     gui = self.kernel.gui
--> 200 return super().enable_pylab(gui, import_all, welcome_message)

TypeError: InteractiveShell.enable_pylab() takes from 1 to 3 positional arguments but 4 were given

eigenbrot avatar Apr 21 '25 15:04 eigenbrot

Hi @crhea93, I see that the reviews are coming in, and I wanted to check what the next steps are. Is it okay for me to start working on them, or should I wait for an official sign-off?

pudeIko avatar Apr 28 '25 08:04 pudeIko

@pudeIko My apologies for not being on top of this! I was on a family vacation for the last two weeks :)

Yes, you can go ahead and start addressing these comments!

crhea93 avatar Apr 28 '25 12:04 crhea93

Hi all! 👋

Once again, thank you very much for your time, as well as for your valuable and positive reviews!

Over the past few weeks, I’ve been working through the received comments, and I’m happy to say that I’ve addressed the vast majority of them. I believe the package is now ready for another round of review. The only outstanding point is the remark regarding the relatively low test coverage, which I’ll be actively working on in the coming days.

@eigenbrot, below I’ve provided point-by-point responses to your comments. In some cases, I’d like to clarify certain points or explain my reasoning and verify if my understanding is correct. Please let me know if anything remains unclear.

@jsdodge, I noticed that in your review, not all checkboxes were marked as fulfilled. Hopefully, the recent updates resolve those issues, but if anything is still missing or needs correction, I’d kindly ask you to provide a brief description so I can address it accordingly.

Looking forward to your feedback!


Custom Loaders/Plugins

  1. The construction of either a Dataloader subclass or a custom QT widget is generally well described, but how they are imported into PIVA through either DataloaderImporter or PluginImporter is less clear. Examples of these *Importer classes are provided in external files, but I would really like to see a detailed description of the parameters and methods of these classes. They are not mentioned in the otherwise-complete API documentation. A suggestion would be to write abstract bases of these *Importer classes that clearly define the interface.

I was concerned that using abstract base classes (ABCs) might cause issues when DataloaderImporter or PluginImporter are expected to support multiple custom add-ons. Therefore, I chose a slightly different approach than using ABCs directly. I updated the example to indicate an import method that should not be overridden and is simply called with the implemented CustomDataloader as an argument. I also added more detailed in-code documentation and comments, and emphasized in the online documentation that using the provided template is recommended as a tested and reliable solution. Similarly, for custom widgets, I extended the documentation and added comments in the example code to clarify the exact steps required for correct implementation.

  1. The fact that a Dataloader subclass must contain a load_data() method is a good indication that Dataloader should probably be an abstract base class.

Absolutely! I have updated the implementation accordingly, along with the corresponding documentation.

Test System

  1. Per the documentation, PIVA tests are run by calling the test modules directly, which starts a pytest session. This felt weird to me; I think most people are used to simply running pytest in the project root, which does work for PIVA. Running pytest is the method for testing described in CONTRIBUTING.md.

The idea was to allow users to run individual test modules separately, which I thought might be preferable in some cases. Nevertheless, I’ve updated the documentation to present this only as an alternative option, while recommending the standard approach of running pytest from the project root.

  1. pytest is not listed in the package requirements table. There is also no "test" pip extra.

Both pytest and tox have now been added as optional dependencies.

  1. Test coverage for the GUI functionality seems high, but the rest of the package could use more tests. In particular, there don't seem to be any tests of the working_procedures module.
  1. Despite the presence of a "tox.ini" file, simply running tox locally fails. (I suspect tox is only used for CI builds, so maybe this doesn't matter)

I don’t encounter any issues when running tox locally. Could you please share the error message you’re seeing? That would help me identify and resolve the problem more effectively.

Use of Older Frameworks

There are few places in the code where older methods and frameworks could be replaced with more modern alternatives that would improve the maintainability and robustness of the package. In particular:

  1. The Dataset class inherits from python's Namespace class, which allows for properties to be accessing with a nice ds.property syntax. I would highly recommend moving to using either python dataclasses or pydantic BaseModels, both of which are specifically designed for this type of interaction. Pydantic's BaseModel also offers a lot of useful type-validation functionality that may be particularly useful because the internals of the Dataset are tightly constrained to how that class is used by other code.

Changed! Please note that this change breaks compatibility with previously pickled/saved files. Also, argparse is still listed as a package requirement. Since it's no longer essential, I created an issue to review remaining occurrences and simplify the code accordingly.

  1. In a few places (e.g., Notebooks created by the "touch" button in the Data Viewer) file paths are concatenated with a + operator. At the very least I would recommend using os.path.join, or (even better) pathlib.Path objects.

All occurrences should now be corrected. I was already using os.path in most places, so I stayed with it for consistency.

Documentation

Overall the documentation is extremely good. My comments are minor:

  1. PIVA's pypi page does not contain a "Documentation" link to the readthedocs site. This is a shame because the documentation is so good!

The link to the documentation has been added to the pyproject.toml file and will appear on the PyPI page after the next release.

  1. I think the "What is piva" section would be more impactful if the multiple strengths/features of piva were given their own subheadings.

Done!

  1. The "Implemented Dataloaders" table is great, and would be even more great if the individual data loaders were links to the respective class in the API documentation.

Done!

  1. On the "Data Ingestion and Analysis" page it would be very useful to have an example dataset (the same one used in the example code) so a user can follow along directly. I was eventually able to run the notebook with one of the "test/data" datasets; perhaps elevating one or more of these datasets to an "examples" folder would make their utility more obvious.

The example dataset has been uploaded and is now referenced in the documentation.

Misc

  • Data Viewer line plots show nothing if NaN values are present. The images still render correctly.

Is this something you’d like me to address?

  • astropy is listed as a dependency, but I can't find its usage anywhere
  • Related, the use of astropy.units might be helpful. It's also possible the entire constants module could be replaced with astropy.constants.

I decided to remove astropy from the list of dependencies. However, I added a suggestion issue to consider migrating to astropy.constants in the future.

  • When Saving data to a pickle (via the "save" button in the Data Viewer) it would be nice to be able to specify the save directory.

Implemented.

  • I found db as the script name to be a little confusing. I kept expecting it to be piva.

Changed! I used to defend it, but many people pointed it out — so I finally gave in. 😄

  • I was expecting the "touch" and "start JL session" buttons to populate a notebook with correct paths for the dataset loaded into the Data Viewer. Instead it looks like they just copied a generic template to the given directory.

I originally tried to implement it that way, but modifying specific lines in notebooks was error-prone and often led to files that couldn’t be executed in Jupyter. So I opted for the more stable template-based approach.

  • It looks like ruff errors are being ignored in CI

I resolved the existing ruff errors and flagged it with --exit-zero. This seemed like a good compromise — all issues are reported without blocking CI runs.

  • In tiled window manager, content in windows gets truncated if the window is small enough

I'm not a user of such environments myself, so I'm unsure how to approach this issue. I’m open to suggestions or contributions here.

  • Any opened Jupyter sessions (created with the "start JL session" button) persist after closing all Piva windows.

I’ve implemented a method to stop all opened servers when PIVA is closed.

  • Installation was painless with multiple methods, even those not described in the documentation. FWIW, PIVA appears to work correctly (and the tests pass) for all python versions between 3.10 and 3.13.
  • When clicking the "open in PIT" button the resulting PIT window displays a terminal with the following error in python 3.11 (but not 3.10):

PIT is a separate package developed by a colleague. While I find it incredibly useful — especially for viewing data along arbitrary directions — I’m not responsible for its development or compatibility. I’ve added an info window to make users aware of this.

pudeIko avatar Jun 07 '25 20:06 pudeIko

Hi all 👋

Just a quick update — I’ve increased the test coverage to 81%, and the package is now officially ready for the second round of review. 🚀

I’m looking forward to your feedback!

pudeIko avatar Jun 10 '25 10:06 pudeIko

@pudeIko thank you very much for addressing each of the comments in detail! @eigenbrot @jsdodge At your convenience please review the changes and let @pudeIko know if there are any subsequent changes you'd like to see!

crhea93 avatar Jun 10 '25 10:06 crhea93

@pudeIko, thanks for your updates. I will be looking over the revised package in the next week or so.

eigenbrot avatar Jun 10 '25 14:06 eigenbrot

Package Review 2

General

Thanks, @pudeIko, for your changes, I can tell a lot of hard work went into these updates! My new comments are below. Anything from before that I don't mention is something that I have no additional comments on; thank your for addressing and/or clarifying those issues.

*Importer Classes

The additional documentation and examples for the *Importer classes really help a lot. Nice work! Regarding your comment about making DataloaderImporter an ABC, I understand what you're saying about multiple custom loaders and I think a solution could be something like this (built on your example)

from abc import ABC, abstractmethod

class DataloaderImporterBase(ABC):
    """
    Base Dataloader importer for custom **Dataloaders** written for PIVA.
    
    Add any custom `DataLoader` subclasses to the `data_loaders` property. 
    """

    def __init__(self, data_browser: DataBrowser) -> None:
        """
        Initialize DataloaderImporter and export custom written loaders to the 
        record of available **Dataloaders**.

        :param data_browser: `DataBrowser` of the current session
        """

        # assign a reference to the DataBrowser
        self.db = data_browser
        self.db.dp_dl_picker.insertSeparator(self.db.dp_dl_picker.count())


        # use this method to add the loader to the DataBrowser's record!
        for loader in self.custom_loader_list:
            self.add_dataloader(loader)

    @property
    @abstractmethod
    def custom_loader_list(self) -> list[DataLoader]:
        """The list of custom `DataLoaders` to add to PIVA."""
        pass
            
    @final
    def add_dataloader(self, loader: Dataloader) -> None:
        """
        Method for exporting implemented custom Dataloader to the 
        DataBrowser's list of available dataloaders. 
        
        Should not be overridden!

        :param loader: the implemented CustomDataloader that needs to be 
               imported
        """

        dl_label = '{}'.format(loader.name)

        self.db.dp_dl_picker.addItem(dl_label)
        self.db.add_dataloader_to_record(dl_label, loader)

        print('\t', dl_label)

and then a user's "piva_dataloader_importer.py" could look like

from piva.somewhere import DataloaderImporterBase
from .my_loaders import Loader1, Loader2

class MyGreatImporter(DataloaderImporterBase):
    @property
    def custom_loader_list(self) -> list[DataLoader]:
        return [Loader1, Loader2]

As I see it, the primary advantage of this method would be that DataloaderImporterBase would be a first-class object in the repo and therefore included in the excellent API documentation. A similar treatment could be given to the PluginImporter, as well.

To be clear, this is just a suggestion; I think the additions to the documentation and examples are sufficient to call this issue resolved.

Tests

Thank you for adding test coverage, specifically of the working_procedures module. Thank you also for adding more detail to the online documentation about running tests. On this note, I think CONTRIBUTING.md may now be a bit out of date; perhaps the "Install Dependencies" step should have pip install -e .[test] instead of using the requirements file?

I still have two concerns about the tests:

  1. It looks like the viewer, jupyter, and working procedures tests are only run on ubuntu-latest with python 3.10 in the github CI. Is there a reason to skip other OS's/versions?
  2. I can't run the tests for the working_procedures module on my computer. When I try to run them they take up 100% of a CPU for about 20 minutes before the processes is killed by my kernel for using up all (~50 GB) of my RAM. On Github I see that these tests run in ~15s, so I'm not sure what's going on here.

Related, I no longer get an error when running tox, but the tox config only seems to run the working_procedures test. Furthermore, it looks like tox isn't used in Github actions. If this is true, it might be better to just remove tox altogether.

New Dataset class

Thank you for using BaseModel as the base for the Dataset class. I just have 2 very minor notes on that implementation:

  1. The required attributes data, xscale, yscale, and zscale would be better defined with no default (right now they default to None). This would cause pydantic to raise a validation error if they are not provided.
  2. IIRC the type-hint SOME_TYPE | None has been preferred over Optional[SOME_TYPE] for a while (since python 3.10, maybe?).

e.g.

class Dataset(BaseModel):
    ...
    
    data: np.ndarray
    xscale: np.ndarray
    yscale: np.ndarray
    zscale: np.ndarray
    ekin: np.ndarray | None = None
    ...

CI Linting Step

I'm not sure I see the point of using --exit-zero with ruff in the linting CI step. This feels barely different than simply ignoring the step; in both cases any linting errors would be ignored in an automated run and the onus is on a human to manually fix things. In other words, I'd argue that the only way to ensure correct code formatting is to have the CI break if the formatting is bad. I don't know how stringint PyOpenSci is about this requirement, so I'll leave it up to @crhea93 for the final say on whether this would block approval.

If you decide to not ignore ruff errors in the CI then I would suggest looking into using ruff's pre-commit hook to automatically fix any formatting errors everytime a commit is made.

Misc

  • The "Citing" section of README.md still says "TBD".
  • The requirements.txt and the "Dependencies" section of the online docs are out of date w.r.t to the removal of astropy.
  • Regarding NaN values causing the line plots to disappear: I'm fine with whatever you decide. Maybe in the field of ARPES data NaNs are uncommon. They occur often in data I look at (astronomy). If it was something simple like plotting (x[~np.isnan(y)], y[~np.isnan(y)]) instead of just (x, y) then it may be worth it. Either way is fine with me.
  • The online docs still make reference to the command name db on the "Data Browser" page and in the note box regarding wayland on the "Installation" page.
  • I'm having an issue with the new "Au-test_spectrum.p" file. If I open it in a notebook as suggested on the "Data Ingestion and Analysis" page then the dataset objet has no .data attribute. If I try to open the same file with piva itself I get the following error in the consol window:
Traceback (most recent call last):
  File "~/piva/piva/data_browser.py", line 118, in launch_piva
    self.open_dv(fname)
  File "~/piva/piva/data_browser.py", line 194, in open_dv
    raise e
  File "~/piva/piva/data_browser.py", line 182, in open_dv
    if data_set.xscale.size == 1:
AttributeError: 'NoneType' object has no attribute 'size'
  • Nice work with the new Jupter shutdown capabilities!

eigenbrot avatar Jun 13 '25 21:06 eigenbrot

Thank you for this @eigenbrot ! We encourage following standard lint practices but understand that it isn't always feasible, so this would not block approval :)

@pudeIko If you have the bandwidth you can look at the suggestions from @eigenbrot in his comment. That being said, don't worry too much about this!

crhea93 avatar Jun 13 '25 23:06 crhea93

Thanks again for the detailed remarks, @eigenbrot, and for pointing out the outdated parts of the documentation!

As before, here’s my point-by-point response and I'm looking forward to your feedback!


Regarding your comment about making DataloaderImporter an ABC, I understand what you're saying about multiple custom loaders and I think a solution could be something like this (built on your example)

from abc import ABC, abstractmethod


class DataloaderImporterBase(ABC):
    """
    Base Dataloader importer for custom **Dataloaders** written for PIVA.
    
    Add any custom `DataLoader` subclasses to the `data_loaders` property. 
    """

I really liked the approach and gave it a try, but I encountered the following error:

Traceback (most recent call last):
  File "/Users/kaskadermike/opt/anaconda3/envs/piva/bin/db", line 33, in <module>
    sys.exit(load_entry_point('piva', 'console_scripts', 'db')())
  File "/Users/kaskadermike/git/piva/piva/main.py", line 18, in db
    DataBrowser()
  File "/Users/kaskadermike/git/piva/piva/data_browser.py", line 70, in __init__
    self.load_custom_data_loaders()
  File "/Users/kaskadermike/git/piva/piva/data_browser.py", line 850, in load_custom_data_loaders
    self.dli = getattr(dli, "DataloaderImporterBase")(self)
TypeError: Can't instantiate abstract class DataloaderImporterBase with abstract method custom_loader_list

I believe this is similar to the issue with the Datasets objects (related to initialization and validation, described below). Since the DataBrowser searches for a specific DataloaderImported module and executes code from it, switching to an ABC would require reworking the entire importing mechanism. So, if the previous changes resolve the issue, I’d prefer to stick with the current approach.

Tests Thank you for adding test coverage, specifically of the working_procedures module.

Thank you also for adding more detail to the online documentation about running tests. On this note, I think CONTRIBUTING.md may now be a bit out of date; perhaps the "Install Dependencies" step should have pip install -e .[test] instead of using the requirements file?

Done!

I still have two concerns about the tests:

  1. It looks like the viewer, jupyter, and working procedures tests are only run on ubuntu-latest with python 3.10 in the github CI. Is there a reason to skip other OS's/versions?

You're right — the tests for the viewer, Jupyter, and working procedures currently run only on ubuntu-latest with Python 3.10 in the GitHub CI. The main reason for this is that I had significant trouble getting them to work on other OSes, I believe due to the use of qtbot and GUI components. After quite a bit of research, I managed to find a working setup for Ubuntu, but on macOS and Windows the tests consistently fail during viewers_test.

That said, I have extended the coverage to include the remaining Python versions — just limited to Ubuntu for now.

As for the tests that do pass across all platforms: the dataloaders and working_procedures tests. I'm running the former because it effectively verifies that the package and its core entry points install and initialize correctly. The latter I chose to exclude purely out of thrift — it takes about 10 minutes to complete on macOS, and considering GitHub’s time multipliers, it could easily consume all of my available free CI minutes.

  1. I can't run the tests for the working_procedures module on my computer. When I try to run them they take up 100% of a CPU for about 20 minutes before the processes is killed by my kernel for using up all (~50 GB) of my RAM. On Github I see that these tests run in ~15s, so I'm not sure what's going on here.

I encountered the same problem and traced it back to a few of the fitting routines. The issue seemed to be caused by the size and complexity of the data being fitted. I resolved it by downsampling the dataset used in the tests, which allowed them to run successfully both locally and in GitHub Actions.

That said, I'm not entirely sure why it's still not completing on your machine, but my guess is the issue is somewhere there.

Related, I no longer get an error when running tox, but the tox config only seems to run the working_procedures test. Furthermore, it looks like tox isn't used in Github actions. If this is true, it might be better to just remove tox altogether.

My mistake! It was only running working_procedures_test because that’s what I had tested last. Anyways, following your advice, I’ve now moved tox.ini to .gitignore.

New Dataset class

Thank you for using BaseModel as the base for the Dataset class. I just have 2 very minor notes on that implementation:

  1. The required attributes data, xscale, yscale, and zscale would be better defined with no default (right now they default to None). This would cause pydantic to raise a validation error if they are not provided.

Done! However, since these attributes are populated later, I was initially running into a ValidationError. To handle this, I'm now using Dataset.model_construct() during initialization and validating the instance upon return. This should effectively achieve the same goal. I’ve also updated the custom examples accordingly.

  1. IIRC the type-hint SOME_TYPE | None has been preferred over Optional[SOME_TYPE] for a while (since python 3.10, maybe?). e.g.

class Dataset(BaseModel): ...

data: np.ndarray
xscale: np.ndarray
yscale: np.ndarray
zscale: np.ndarray
ekin: np.ndarray | None = None
...

Done!

CI Linting Step I'm not sure I see the point of using --exit-zero with ruff in the linting CI step. This feels barely different than simply ignoring the step; in both cases any linting errors would be ignored in an automated run and the onus is on a human to manually fix things. In other words, I'd argue that the only way to ensure correct code formatting is to have the CI break if the formatting is bad. I don't know how stringint PyOpenSci is about this requirement, so I'll leave it up to @crhea93 for the final say on whether this would block approval. If you decide to not ignore ruff errors in the CI then I would suggest looking into using ruff's pre-commit hook to automatically fix any formatting errors everytime a commit is made.

I completely agree — it felt the same to me, but I came across it as a commonly suggested workaround. The issue was that, for some reason, ruff format was throwing errors in CI, even though everything worked fine locally. That said, the number of corrections was small, so I applied them manually based on the GitHub output, and now it runs cleanly!

Misc

• The "Citing" section of README.md still says "TBD".

I’ve left it as "TBD" for now, as I’m planning to update it with the full reference once the JOSS publication is out.

• The requirements.txt and the "Dependencies" section of the online docs are out of date w.r.t to the removal of astropy.

Done!

• Regarding NaN values causing the line plots to disappear: I'm fine with whatever you decide. Maybe in the field of ARPES data NaNs are uncommon. They occur often in data I look at (astronomy). If it was something simple like plotting (x[~np.isnan(y)], y[~np.isnan(y)]) instead of just (x, y) then it may be worth it. Either way is fine with me.

I hadn’t really considered NaNs before — which probably means they’re quite rare in my case 😄. In ARPES, they typically only occur when an entire scan fails. I implemented your suggestion by filtering out the NaNs in the viewers, and everything seems to work fine with my data and during testing. Let me know if it improves the behavior on your end. I applied the change only to the viewers; since the fitters are very ARPES-specific, I left those unchanged.

• The online docs still make reference to the command name db on the "Data Browser" page and in the note box regarding wayland on the "Installation" page.

Done!

• I'm having an issue with the new "Au-test_spectrum.p" file. If I open it in a notebook as suggested on the "Data Ingestion and Analysis" page then the dataset objet has no .data attribute. If I try to open the same file with piva itself I get the following error in the consol window:

Traceback (most recent call last): File "~/piva/piva/data_browser.py", line 118, in launch_piva self.open_dv(fname) File "~/piva/piva/data_browser.py", line 194, in open_dv raise e File "~/piva/piva/data_browser.py", line 182, in open_dv if data_set.xscale.size == 1: AttributeError: 'NoneType' object has no attribute 'size'

Yes, that issue was caused by a conflict with older files saved using the deprecated Dataset(Namespace) class. It should be resolved now!

• Nice work with the new Jupter shutdown capabilities!

Thanks!

pudeIko avatar Jun 20 '25 09:06 pudeIko

Thanks, @pudeIko. I probably won't be able to get to this until early next week, but, judging from your responses, it sounds like the my next look will be pretty quick!

eigenbrot avatar Jun 20 '25 14:06 eigenbrot

I will also complete my review next week. Sorry for the delay.

jsdodge avatar Jun 20 '25 22:06 jsdodge

Thanks @pudeIko, @eigenbrot , and @jsdodge for all the great work on this! It's coming along nicely :)

crhea93 avatar Jun 23 '25 14:06 crhea93

Package Review 3

Thanks again, @pudeIko! These new updates are great and I think I'm ready to approve this review. I have a few minor comments that you can implement if you want; nothing that will block acceptance.

  • I see your point as to why the ABC doesn't work in DataloaderImporter and agree that moving in that direction would require a full rewrite of the import mechanisms. Thanks for indulging me, anyway!
  • Thank you for extending the test matrix to include more python versions.
  • I hear what you're saying about the length of the working_procedures test. I poked into this a bit and (on my machine, at least) the popt, cov = curve_fit(wp.two_lorentzians, peak_x, peak_y) line is what is taking all of the time and RAM. If I comment out that line then the all the tests run in a second or two. I don't know the details of these tests so I'm not sure if that's useful or not. I think at this point I'm fine with the situation where the tests at least run on the Github CI. You might consider leaving a note somewhere that the working_procedures tests are known to hang up on some systems.
  • I see the issue you ran into with the Dataset base model, and I think the .model_construct() solution is great. I didn't even know about that method!
  • Related to the above: given that you are essentially instantiating an empty Dataset and then filling in the relevant attributes, you might consider setting the validate_assignment option to True in the config_dict of Dataset. This will ensure that all attributes are always the correct type.
  • I'm glad you were able to get the linting sorted. Having an automatic linter is the path to formatting bliss :grin:
  • Thanks for fixing the NaN value issue in the viewers. My data look great now :grinning:
  • Thanks for fixing the issue with the ""Au-test_spectrum.p" test file. It now loads correctly and I can replicate the results shown in the "Data Ingestion and Analysis" page.

@crhea93, unless @pudeIko has any other comments/changes, I'm ready to sign off on this review. How do I do that?

eigenbrot avatar Jun 24 '25 22:06 eigenbrot

Thanks a lot @eigenbrot for your prompt response!

Answering two of your comments:

  • Since it's just this one problematic function, I decided to comment it out for now — it's not particularly relevant and doesn't significantly impact test coverage.
  • I experimented with validate_assignment=True, but ran into errors again. However, I think I found a good workaround by adding a method to the base class:
class Dataloader(ABC):

...
      def validate_at_return(self, filename: str):
          """
          Validate that the Dataset was correctly populated with data and add
          original file information to the data provenance record.
  
          :param filename: absolute path to the file
          :return: loaded dataset with available metadata
          """
  
          if self.raster:
              for xi in range(self.scan.shape[0]):
                  for yi in range(self.scan.shape[1]):
                      dsi = self.scan[xi, yi]
                      dsi.add_org_file_entry(filename, self.name)
                      dsi.validate_assignment = True
                      self.scan[xi, yi] = Dataset.model_validate(dsi.model_dump())
              return self.scan
          else:
              self.ds.add_org_file_entry(filename, self.name)
              self.ds.validate_assignment = True
              return Dataset.model_validate(self.ds.model_dump())

Then I adjusted all dataloaders to use this method when returning:

class DataloaderPickle(Dataloader):

    def load_data(self, filename: str, metadata: bool = False) -> Dataset:
        ...
       
        return self.validate_at_return()

I'm really happy to hear that the package has passed acceptance — and thank you so much for all the remarks and suggestions! I truly appreciate the improvements they led to!

pudeIko avatar Jun 25 '25 11:06 pudeIko

@eigenbrot Thank you very much for this hard work (and thank you @pudeIko for the detailed responses). I'll make a note that you (@eigenbrot ) accept the package as ready for publication in the PyOpenSci ecosystem. At this point, you have no more duties as a reviewer :D

Once @jsdodge wraps up his review (this is not me trying to rush you!!), I'll begin the procedure of a formal acceptance :)

crhea93 avatar Jun 26 '25 17:06 crhea93

Hi All, Just wanted to kindly check in and ask about the next steps in the review process — hoping the submission doesn’t fall off our radar :) Please let me know if anything is needed from my side!

pudeIko avatar Jul 26 '25 07:07 pudeIko

Hi @pudeIko , I submitted my review yesterday and listed several (mostly minor) issues. Apologies for the delay with that. Please ping me when you've responded to my review and I'll respond more promptly going forward.

jsdodge avatar Jul 26 '25 14:07 jsdodge