joss-reviews icon indicating copy to clipboard operation
joss-reviews copied to clipboard

[REVIEW]: Ipyannotator: the infinitely hackable annotation framework

Open editorialbot opened this issue 2 years ago • 38 comments

Submitting author: @itepifanio (Ítalo Epifânio) Repository: https://github.com/palaimon/ipyannotator/ Branch with paper.md (empty if default branch): joss Version: 0.8.1 Editor: @danielskatz Reviewers: @csadorf, @matthewfeickert Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/5306e204788fd469104d77321223d7a1"><img src="https://joss.theoj.org/papers/5306e204788fd469104d77321223d7a1/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/5306e204788fd469104d77321223d7a1/status.svg)](https://joss.theoj.org/papers/5306e204788fd469104d77321223d7a1)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@csadorf & @matthewfeickert, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @matthewfeickert

📝 Checklist for @csadorf

editorialbot avatar Jun 15 '22 13:06 editorialbot

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot avatar Jun 15 '22 13:06 editorialbot

Software report:

github.com/AlDanial/cloc v 1.88  T=0.03 s (151.1 files/s, 20619.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
SVG                              2              0              0            286
Markdown                         1             50              0            105
TeX                              1              9              0             96
-------------------------------------------------------------------------------
SUM:                             4             59              0            487
-------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

editorialbot avatar Jun 15 '22 13:06 editorialbot

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3359141 is OK
- 10.1145/3359141 is OK
- 10.1109/icip42928.2021.9506683 is OK
- 10.1145/3343031.3350535 is OK
- 10.1007/s11548-018-1864-x is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot avatar Jun 15 '22 13:06 editorialbot

Wordcount for paper.md is 1790

editorialbot avatar Jun 15 '22 13:06 editorialbot

Failed to discover a valid open source license

editorialbot avatar Jun 15 '22 13:06 editorialbot

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot avatar Jun 15 '22 13:06 editorialbot

@csadorf and @matthewfeickert - Thanks for agreeing to review this submission. This is the review thread for the paper. All of our communications will happen here from now on.

As you can see above, you each should use the command @editorialbot generate my checklist to create your review checklist. @editorialbot commands need to be the first thing in a new comment.

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4480 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if either of you require some more time. We can also use editorialbot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@danielskatz) if you have any questions/concerns.

danielskatz avatar Jun 15 '22 13:06 danielskatz

Review checklist for @matthewfeickert

Conflict of interest

  • [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • [x] Repository: Is the source code for this software available at the https://github.com/palaimon/ipyannotator/?

    • Yes. :+1:
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?

  • [x] Contribution and authorship: Has the submitting author (@itepifanio) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

    • Though the repository itself offers little evidence of contribution
    $ git log --oneline | wc -l
    36
    $ git log --author="[email protected]" --oneline | wc -l
    2
    

    the authors have indicated in the pre-review (https://github.com/openjournals/joss-reviews/issues/4318#issuecomment-1109745642) that they

    use a private GitLab repository and release all changes at once on Github, that's why we have just a few commits.

    and that as indicated by their CHANGELOG @itepifanio is a core developer (https://github.com/openjournals/joss-reviews/issues/4480#issuecomment-1179321217). I am willing to accept this as evidence that @itepifanio, as well as Oleksandr Pysarenko, and Immanuel Bayer have contributed substantially. :+1:

  • [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

JOSS asks for a minimum of 3 months of work equivalent effort for the software to be considered. The project definitely fulfills this requirement.

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?

    • Yes. After the nice work done in https://github.com/palaimon/ipyannotator/pull/40 it will be possible to install ipyannotator in Python 3.8+ environments easily in future releases following the instructions given. :+1:
  • [x] Functionality: Have the functional claims of the software been confirmed?

    • The docs don't make too many claims of functionality other than

    The library contains some pre-defined annotators that can be used out of the box, but it also can be extend and customized according to the users needs.

    As no specific additional claims are made, perhaps the functionality is best assessed through running all of the tutorials available. As far as I can tell all the example in the tutorials are viable and the software works as described.

  • [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

    • No specific performance claims are made.

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

    • Only briefly in the README. While this could be made more explicit, the addition of the Design Decisions page of the docs in https://github.com/palaimon/ipyannotator/pull/43 improves things. Once this has been added to the docs navigation so that it is findable publicly (https://github.com/palaimon/ipyannotator/pull/43#pullrequestreview-1081465836) this should be resolved. This is now resolved in https://github.com/palaimon/ipyannotator/pull/45.
  • [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.

    • There isn't an explicit list in the README, but that isn't necessarily something that I would want to check all the time as a reviewer/user and it is been made clear that this is an nbdev projects so I think that with the addition of the work in https://github.com/palaimon/ipyannotator/pull/40 that this is well covered now with dependencies that reflect viable lower bound requirements without being severely over restrictive. :+1:
  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

    • Yes. The tutorials cover this well.
  • [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?

    • The API reference is very short. As I mentioned in the pre-review (https://github.com/openjournals/joss-reviews/issues/4318#issuecomment-1111185346) robust API documentation is essential. The API reference would benefit substatially from more detailed discussion of the methods.

    An API reference should be sufficient such that a user understands how to use the API from the documentation alone. PR https://github.com/palaimon/ipyannotator/pull/43 does provide some additional information to the API, but things are still very brief. The authors have made it clear that they would prefer to adhere to nbdev cultural norms of API documentation through examples and not through robust API reference documentation, so I am willing to accept this to avoid delaying publication on this area.

  • [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

    • There are automated tests and PR https://github.com/palaimon/ipyannotator/pull/40 improves on them by testing across all supported Python versions. The reported test coverage is 90%, which is rather low, but I will consider this as acceptable for publication. I would strongly suggest to bring up the coverage as part of future work.
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?

    • Yes. The paper is still a bit long at 5 pages before references, but PRs https://github.com/palaimon/ipyannotator/pull/42 and https://github.com/palaimon/ipyannotator/pull/43 have added improvements by moving components to the documentation to an extent that I am satisfied with the state of the summary level of the paper.
  • [x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?

    • Yes. It would be beneficial to emphasize who exactly in the machine learning the target audience is though.
  • [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?

    • The paper notes that

    (Lahtinen et al., 2021) developed an annotator as a browser plugin, (Dutta & Zisserman, 2019) built an annotator as a program that runs on the browser, and (Bernal et al., 2019) developed a desktop application for domain-specific medical image annotation. The previous annotation tools are restricted to fixed data types and a specific type of annotation.

    It would be useful if their code was linked to as well. If it is not open source code, then that should be stated.

  • [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?

    • The paper is well written.
  • [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

    • Yes.

matthewfeickert avatar Jun 15 '22 15:06 matthewfeickert

Review checklist for @csadorf

Conflict of interest

  • [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

General checks

  • [x] Repository: Is the source code for this software available at the https://github.com/palaimon/ipyannotator/?
  • [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • [x] Contribution and authorship: Has the submitting author (@itepifanio) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
  • [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

  • [x] Installation: Does installation proceed as outlined in the documentation?
  • [x] Functionality: Have the functional claims of the software been confirmed?
  • [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
  • [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
  • [x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
  • [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
  • [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
  • [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

csadorf avatar Jun 16 '22 08:06 csadorf

👋 @csadorf and @matthewfeickert - thanks for generating your checklists. Is there anything blocking you from starting to check off items? (and I know it's just a week since we started, so this is really a friendly question)

danielskatz avatar Jun 23 '22 14:06 danielskatz

👋 @csadorf and @matthewfeickert - thanks for generating your checklists. Is there anything blocking you from starting to check off items? (and I know it's just a week since we started, so this is really a friendly question)

No, just need to find some time for this. I expect to be able to make some progress early next week. Thank you for the reminder.

csadorf avatar Jun 23 '22 15:06 csadorf

@danielskatz We have just been informed that an ipyannotator introduction talk, that @itepifanio have submitted to Python Nordeste, has been accepted.

It would be great if we could incorporate feedback from this review before this talk. Please let me know if we can do something to get the review started.

edit: fix typo, in cooperate -> incorporate

ibayer avatar Jul 05 '22 07:07 ibayer

I'm sorry, I don't understand what "if we could in cooperate feedback from this review" means.

Also, the review is started, though the reviewers (@matthewfeickert and @csadorf) haven't yet indicated progress in their checklists.

danielskatz avatar Jul 05 '22 12:07 danielskatz

I'm sorry, I don't understand what "if we could in incorporate feedback from this review" means.

Ah, sorry I didn't mean to be cryptic. All I wanted to express is that we are starting to prepare the talk and it's material. So any feedback the reviewer provide will not only help us to improve the library, but will also influence the talk, if we get it early enough.

Also, ideally we would like to present the library in the exact same version it's described in the JOSS article, since both will be a permanent reference points. :)

edit: fix typo, in cooperate -> incorporate

ibayer avatar Jul 05 '22 12:07 ibayer

Ah, I think you meant to "incorporate feedback". Ok, but we need to wait for the reviewers...

danielskatz avatar Jul 05 '22 12:07 danielskatz

The paper describes the Ipyannotator software, which is aimed at researchers who want to efficiently annotate images and still frames to train a supervised machine learning model for image and video classification.

The paper is written very well with only minor language errors and clearly outlines the architecture and design philosophy of the presented software.

There are however a a few issues that I would like to see addressed.

The significance of contributions by the first author to the source code base cannot be verified. Assuming that @AlexJoz is Oleksandr Pysarenko, it seems that the second author has made the most contributions to the package's source code.

One of the key claims is that the software is "infinitely hackable". I was not able to fully understand or verify this claim within the time that I spent on the review and testing the software. According to the paper the frameworks flexibility stems from the fact that it is run in Jupyter notebooks. Since this attribute constitutes the unique value proposition, I would suggest to further elaborate on this point or provide a simple example on what exactly is meant by that and what the benefits are.

The code is written using a literate programming style which is why the majority of source code is in the form of Jupyter notebooks (~ 7,500 LOC) resulting in about 5,000 LOC of auto-generated Python code and appears to be of overall sufficient quality. I therefor have no doubts about the significance of the scholarly contribution.

The installation instructions are clear, albeit the list of dependencies presented in the README and the docs home page (same source), uses a within the Python eco-system non-standard carot-symbol (^) to indicate the range of supported Python versions. I would recommend to stick to a dependency specification that complies with specifiers documented in PEP 440.

While the installation flow is overall straight-forward, it is not immediately clear which steps a user should take immediately after installation. A few sentences that instruct the user on what exactly to do run an example, go through the tutorial, or simply use the software are missing and must be deduced.

As stated before, documentation is generated from Jupyter notebooks which means that it is quite extensive, however it also appears to be a bit scattered and of widely varying quality. Especially the API documentation is severely lacking. A clear statement of need, as found in the paper, is also missing from the documentation.

As a final note, it appears that the repository contains automated tests, but there are no instructions on how to run those.

csadorf avatar Jul 08 '22 16:07 csadorf

Thanks @csadorf!

If you can say anything more about "The paper is written very well with only minor language errors", please do, either in a comment or in a PR to the paper.md file. If not, I will also proofread the paper later in the process

danielskatz avatar Jul 08 '22 16:07 danielskatz

Thanks for the feedback!

The significance of contributions by the first author to the source code base cannot be verified.

We have been following a Private Development, Public Releases workflow[0]. The downside of this approach is that the number of commits https://github.com/openjournals/joss-reviews/issues/4318#issuecomment-1110934810 as well as individual contributions can't be deduced from the git history.

All authors have contributed code, but @itepifanio has written by far the majority. Maybe the CHANGELOG.md can be used to support my assurance.

We'll address all other points in detail later.

[0] https://www.braintreepayments.com/blog/our-git-workflow/

ibayer avatar Jul 08 '22 20:07 ibayer

@editorialbot generate pdf

itepifanio avatar Jul 12 '22 19:07 itepifanio

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot avatar Jul 12 '22 19:07 editorialbot

@csadorf thanks for your feedback. I believe that our last modifications address your suggestions.

You can find a PR with the JOSS paper changes and another with the documentation enhancements. The following changes were made:

  • We add a new section on the paper to describe why Ipyannotator is an "infinitely hackable" library
  • The list of dependencies was removed from the README. The information was necessary in the past, but now all dependencies can be found on pyproject.toml
  • The follow-up instructions to the install step were added on the README, and in the docs
  • A better introduction (statement of need) was added to the docs
  • The API documentation was fully elaborated, adding more resources, describing decisions and linking it with the tutorials
  • A new section describing how to run the library tests was added to the docs

Please let us know if you have any other concerns

itepifanio avatar Jul 12 '22 19:07 itepifanio

@csadorf - does this ☝️ satisfy your concerns?

danielskatz avatar Jul 13 '22 17:07 danielskatz

👋 @matthewfeickert - I know you've been and are busy with conferences, but just wanted to remind you about this again...

danielskatz avatar Jul 13 '22 17:07 danielskatz

@csadorf thanks for your feedback. I believe that our last modifications address your suggestions.

You can find a PR with the JOSS paper changes and another with the documentation enhancements. The following changes were made:

Thank you for addressing my concerns.

  • We add a new section on the paper to describe why Ipyannotator is an "infinitely hackable" library

I think the edited description provides a much better justification for this claim, however the fact that ipyannotator is implemented in Python and executed via Jupyter notebooks provides barely any justification for this claim since the same could be said for literally any other open-source software. While I understand that this is to be understood as hyperbole, I think it is important to point out very concretely which elements specifically make this software significantly more customizable compared to similar open-source frameworks.

I think both the paper and the docs are providing sufficient explanation now, the only suggestion I would make is to remove any language that appears to be imply that (Python + OSS) is sufficient grounds for the claim. Specifically I have an issue with this sentence:

Since Ipyannotator runs on top of Jupyter notebooks and due to the dynamic nature of the Python language, extra features and customization can be easily developed by the data science team, creating an "infinitely hackable annotation framework".

  • The list of dependencies was removed from the README. The information was necessary in the past, but now all dependencies can be found on pyproject.toml

  • The follow-up instructions to the install step were added on the README, and in the docs

  • A better introduction (statement of need) was added to the docs

  • The API documentation was fully elaborated, adding more resources, describing decisions and linking it with the tutorials

I still find it a bit hard to navigate the docs, but it looks like the most important elements are now present.

  • A new section describing how to run the library tests was added to the docs

I am glad to see that the concern was addressed, however there is some significant room for improvement here.

It is usually good practice to define a test environment explicitly. The docs only state that tests require pytest and ipytest without any version information or instructions on how to create the environment. That's odd because it appears that the test environment is well-defined as part of the pyproject.toml file?

Further, the instructions on how to run the tests remain weirdly vague as well. Looking at the CI workflows, it seems that the tests are run via poetry? Is this recommended or required? It is not entirely clear to me why the instructions remain so vague when at the same time tests are run rigorously on each commit as it seems.

Please let us know if you have any other concerns

I don't have any other concerns but those mentioned above.

csadorf avatar Jul 14 '22 16:07 csadorf

@csadorf thanks for the suggestions.

We tried to make the "infinitely hackable" claim clearer on #30.

#31 tries to improve the documentation of the tests.

Is this recommended or required?

We're running using poetry only because it's easier on the CI (it guarantees that the environment is active), it's not a requirement or recommendation.

itepifanio avatar Jul 18 '22 13:07 itepifanio

👋 @matthewfeickert - I know you've been busy with conferences, but just wanted to remind you about this again... hopefully this is now a better time

danielskatz avatar Jul 18 '22 13:07 danielskatz

@csadorf thanks for the suggestions.

We tried to make the "infinitely hackable" claim clearer on #30.

I think it is sufficient now.

#31 tries to improve the documentation of the tests.

Unfortunately the instructions remain ambiguous. I have provided a detailed explanation in this issue.

csadorf avatar Jul 19 '22 09:07 csadorf

@csadorf https://github.com/palaimon/ipyannotator/pull/34 has been merged now. Does this now complete your last open checkbox? Please let us know if you have further requests before we get your final review approval.

ibayer avatar Jul 25 '22 13:07 ibayer

@csadorf palaimon/ipyannotator#34 has been merged now. Does this now complete your last open checkbox? Please let us know if you have further requests before we get your final review approval.

All good from my side.

csadorf avatar Jul 25 '22 13:07 csadorf

👋 @matthewfeickert - when do you think you will have a chance to work on this?

danielskatz avatar Jul 28 '22 08:07 danielskatz