joss-reviews
joss-reviews copied to clipboard
[REVIEW]: specieshindex: How scientifically popular is a species?
Submitting author: @jesstytam (Jessica Tam) Repository: https://github.com/jessicatytam/specieshindex Branch with paper.md (empty if default branch): Version: 0.3.1 Editor: @Bisaloo Reviewers: @kamapu, @tom-gu, @DrMattG Archive: Pending
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
HTML: <a href="https://joss.theoj.org/papers/401f3e0de4da2e338c5903d8a96a910b"><img src="https://joss.theoj.org/papers/401f3e0de4da2e338c5903d8a96a910b/status.svg"></a>
Markdown: [](https://joss.theoj.org/papers/401f3e0de4da2e338c5903d8a96a910b)
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@kamapu & @tom-gu & @DrMattG, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
- Make sure you're logged in to your GitHub account
- Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Bisaloo know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @kamapu
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
- [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.
Code of Conduct
- [x] I confirm that I read and will adhere to the JOSS code of conduct.
General checks
- [x] Repository: Is the source code for this software available at the repository url?
- [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
- [x] Contribution and authorship: Has the submitting author (@jessicatytam) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
- [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
Functionality
- [x] #3902
- [ ] Functionality: Have the functional claims of the software been confirmed?
- [ ] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
Documentation
- [ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
- [ ] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
- [ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
- [ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
- [ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
- [ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Software paper
- [ ] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
- [ ] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
- [ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
- [ ] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
- [ ] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
Review checklist for @tom-gu
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
- [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.
Code of Conduct
- [x] I confirm that I read and will adhere to the JOSS code of conduct.
General checks
- [x] Repository: Is the source code for this software available at the repository url?
- [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
- [x] Contribution and authorship: Has the submitting author (@jessicatytam) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
- [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
Functionality
- [x] Installation: Does installation proceed as outlined in the documentation?
- [x] Functionality: Have the functional claims of the software been confirmed?
- [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
Documentation
- [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
- [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
- [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
- [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
- [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
- [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Software paper
- [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
- [x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
- [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
- [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
- [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
Review checklist for @DrMattG
✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨
Conflict of interest
- [x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.
Code of Conduct
- [x] I confirm that I read and will adhere to the JOSS code of conduct.
General checks
- [x] Repository: Is the source code for this software available at the repository url?
- [x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
- [x] Contribution and authorship: Has the submitting author (@jessicatytam) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
- [x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
Functionality
- [x] Installation: Does installation proceed as outlined in the documentation?
- [ ] Functionality: Have the functional claims of the software been confirmed?
- [x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)
Documentation
- [x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
- [x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
- [x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
- [x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
- [x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
- [x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
Software paper
- [x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
- [x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
- [x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
- [x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
- [x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @kamapu, @tom-gu, @DrMattG it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
- Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

- You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Wordcount for paper.md is 1478
Software report (experimental):
github.com/AlDanial/cloc v 1.88 T=0.05 s (357.9 files/s, 141975.3 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Rmd 2 271 520 2140
R 9 175 1233 1355
TeX 3 31 0 522
Markdown 2 100 0 312
YAML 1 15 1 68
-------------------------------------------------------------------------------
SUM: 17 592 1754 4397
-------------------------------------------------------------------------------
Statistical information for the repository 'bd838b7115269e827ea9c7cb' was
gathered on 2021/09/29.
The following historical commit information, by author, was found:
Author Commits Insertions Deletions % of changes
jessicatytam 2 19 19 100.00
Below are the number of rows from each author that have survived and are still
intact in the current revision:
Author Rows Stability Age % in comments
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1073/pnas.0507655102 is OK
- 10.1111/tbed.12221 is OK
- 10.1111/mam.12066 is OK
- 10.1007/s11192-015-1798-9 is OK
- 10.1007/s11192-007-1859-9 is OK
- 10.1371/journal.pone.0131004 is OK
- 10.1111/mam.12038 is OK
- 10.1038/s41598-017-09084-6 is OK
- 10.1139/facets-2016-0011 is OK
- 10.1007/s11192-006-0147-4 is OK
- 10.1017/S1367943004001799 is OK
- 10.1371/journal.pone.0189577 is OK
- 10.1007/s11192-006-0146-5 is OK
- 10.1016/S0169-5347(01)02381-3 is OK
- 10.1038/s41467-018-07916-1 is OK
- 10.1111/acv.12586 is OK
- 10.1038/s41477-021-00912-2 is OK
- 10.1016/j.gecco.2018.e00423 is OK
- 10.1007/s11160-019-09556-0 is OK
- 10.1111/j.1523-1739.2006.00616.x is OK
- 10.1111/j.1523-1739.2010.01453.x is OK
- 10.1108/07378830610715473 is OK
- 10.1093/eurheartj/ehx446 is OK
- 10.1108/14684521211209581 is OK
MISSING DOIs
- None
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
:wave::wave::wave::wave: @jessicatytam @kamapu @tom-gu @DrMattG, this is the review thread for the paper. All of our communications will happen here from now on.
All reviewers have checklists at the top of this thread with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.
The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#3776 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.
We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use Whedon (our bot) to set automatic reminders if you know you'll be away for a known period of time.
Please feel free to ping me (@Bisaloo) if you have any questions/concerns.
:wave: @tom-gu, please update us on how your review is going (this is an automated reminder).
:wave: @DrMattG, please update us on how your review is going (this is an automated reminder).
:wave: @kamapu, please update us on how your review is going (this is an automated reminder).
Thanks for your comments, @DrMattG!
:wave: @kamapu, @tom-gu, could you please give us an update on your progress for the review of this package & paper? Please let me know if you encounter any issues.
Thanks for your comments, @DrMattG!
👋 @kamapu, @tom-gu, could you please give us an update on your progress for the review of this package & paper? Please let me know if you encounter any issues.
👋 @Bisaloo, I'll submit my review by tomorrow. Sorry for the delay, it's due to the occurrence of too many unexpected events :sweat_smile:
Shame on me! Though too late, I'll provide my comments. Unfortunately I don't have any access or copy to the manuscript now.
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@kamapu, you should be able to access the manuscript via the links in the previous comment :slightly_smiling_face:
@kamapu, you should be able to access the manuscript via the links in the previous comment slightly_smiling_face
Thanks! The robot is closing my issues in the repo of specieshindex. I'm doing something wrong or it is because of been too late.
You need to manually open issues in https://github.com/jessicatytam/specieshindex - do not use the GitHub "convert to issue" feature, as it opens issues in the wrong repository.
@jessicatytam @wcornwell @Bisaloo Here is my full review, so sorry for the delay!
Package overview:
The specieshindex package is showing great potential to promote scientific merit. It is a delightfully user-friendly package with superb documentation. Nonetheless, I identified several areas that can be improved. I did not open specific GitHub issues since they are more systemic and should be addressed as the package matures. Additionally, I had a few minor suggestions and some ideas for future development.
Package structure & development:
Testing & CI
I saw that you recently added several tests and a CI (via GithubAction). It is great to know that you are adopting a more test-driven development (TDD) schema. Pivotal tests for this particular package are smoke tests to monitor API connection and to validate a proper output structure. As you know by now, constructing this for Scopus is easier than for WOS and BASE. However, please refer to this section to learn how to manage secrets (e.g., API keys) within the package.
Code duplication
The level of duplication (code & function documentation) is off the charts. It partially stems from a segregated function strategy and from not fully utilizing sub-functions (internal, non-exported functions). This is problematic from the POV of a package developer since even the smallest changes can become a real hassle. When the small things become exhausting, the big things become impossible! I’m advocating for the consolidation of similar functions by adding a parameter or two. This might slightly affect the user’s experience, but it will greatly serve package maintenance and improve its overall structure. If you decide to make the changes, please avoid putting all the functions in the same R file.
Code refactoring (mostly coding efficiency)
Going forward, I encourage you to review your coding efficiency. Few examples:
- Is it necessary to hit the API so many times? Can post-processing of one API call achieve the same results? What will happen when the current year is 2022 or 2023? Isn’t the raw data by itself valuable?
- When plotting all indices, why not using facet_wrap() or facet_grid() for multi-panel plots?
Defensive programming
Communicating with our users via errors, warnings, and messages significantly improves their user experience and can help mitigate knowledge gaps. Currently, the package has no implementation of defensive programming. I encourage you to embrace it. Here you can find helpful information. Side note: implementing defensive programming is much harder when you have high levels of code duplication.
Motivation for package maintenance
Package maintenance is crucial for establishing software sustainability. This aspect of package development is a demanding God. Adequately quantifying long-term maintenance is rather challenging, even more so for scientists. When trying to connect the dots, it seems that this package was mainly developed for the study of biases in mammalian. If this is the case, I wonder if the Cornwell Lab has a long-term maintenance plan with sufficient resources to support it. Please forgive me if the query seems intrusive, I’m only trying to share insights from my own experience and pain. It might be beneficial to instigate a collaboration with an organization or additional PI that is going to heavily rely on this package. Perhaps @mjwestgate will have specific leads.
Data science >> Software development
I can classify this package under ‘data scientists developing a package’ scenario. It’s clear that you completely understand your user, but do not yet fully comprehend your development needs. In order to support your package development journey, I urge you to read rOpenSci Packaging Guide and R Packages. In a way, now is the perfect time for that. Now, you can properly digest these comprehensive resources.
goodpractice output
See: https://github.com/MangoTheCat/goodpractice
Please run goodpractice::gp() or see this RStudio Project for the goodpractice output.
Minor suggestions:
Installation instructions (README; paper; vignette)
Why you need all of this: https://github.com/jessicatytam/specieshindex/blob/9698952bf81d8d549b7c5bef3a4a557f342bf13d/README.rmd#L22:L31 It’s somewhat redundant, why not just:
install.packages("remotes")
remotes::install_github("jessicatytam/specieshindex")
paper.md file
Code is too long in lines 76, 109, 144-147, and 150. https://raw.githubusercontent.com/openjournals/joss-papers/joss.03776/joss.03776/10.21105.joss.03776.pdf
DESCRIPTION file
Package contributors might be missing (compared to paper authors).
git
I see that both Jessica and Will are committing directly to the master branch. After pkg release, it’s better to have a dev branch and to merge code via PR.
pkgdown
The overall level of package documentation is excellent. Creating a website via pkgdown will beautifully showcase it. Implementation is quite easy and totally worth it!
Ideas for future functionality:
- Introducing a query ‘batch mode’ by passing a dataframe
- Utilizing
taxizeto transparently generate a list of synonyms
Final words
Sharing as many insights as I can is not just my duty but also a privilege, as it makes my own hardships along the way even more worthwhile. I hope you will find this review helpful and not too preachy 😃
From one Biodiversity Jedi to another, may the foRce be with you! Tomer
Review by Miguel
Sorry for the late response. It is my first review of a package and the way to proceed has been challenging for me. Also some "spontaneous" activities were kipping me busy and I just managed to finish my review.
While this package seems to be promising for its use also in the context of my own work, I see the necessity of a major revision of both, the structure of functions as well as in the documentation.
I didn't opened issues for this review. I would suggest to the authors when responding to refer every change to a commit in the repository.
I sum up my suggestions here (detailed comments below):
- Implement a secure way to provide credentials in sessions.
- Re-structure the documentation for the functions
Count*()andFetch*(). In the best case, define just two functions with all possible variants set through the arguments. - Examples in documentation and tutorials should follow the whole path from querying to assessing, discarding the use of pre-stored data sets.
- Get rid of the pre-stored data. The user have to be able, first of repeat the query and get the results or apply the queries to own questions. Pre-stored data can still be hidden and used for making vignettes and avoid possible errors but giving examples to the users on the basis of such pre-stored data is in my opinion rather confusing.
- Not mandatory, but a function querying the metadata of retrieved publications (e.g. for its export to BibTeX) will complete the whole story about
specieshindex.
Comments and Suggestions
README
- [ ] I also support the idea of simplifying code, for instance using following code (include the same in the vignette):
remotes::install_github("jessicatytam/specieshindex", build_vignettes = TRUE, dependencies = TRUE) - [ ] In relation to the last, I was wondering why the the suggested installation from github is not including
build_vignettes = TRUE. If you are fearing, the users may not have any LaTeX distribution, you can rather use HTML as output format. Users may still need to install Rtools and pandoc, though.
Vignette
- [ ] Call the vignette with a topic name, not just
vignette. - [ ] Instructions for installing (same comment as for README).
- [ ] In chapter Workflow a brief description of the databases is required, especially regarding BASE (I didn't knew this one).
- [ ] In chapter Other Arguments and Syntax you should invert the sorting of the sub-chapters Additional Keywords and Count, Fetch, ...* because you explain the functions and their difference in the second one and should be done at the beginning to avoid confusions. For instance I didn't noticed that there is a block using
CountSpT()andCountSpTAK(). - [ ] in
FetchSpT(db = "scopus", genus = "Bettongia", species = "penicillata")I get an error:
Error in get_results(query, start = init_start, count = count, verbose = verbose, :
Bad Request (HTTP 400).
- [ ] I'm wondering why the results of
FetchSp*()orCountSp*()cannot be assigned to an R object. This is very restrictive in the case that you like to implement these functions in statistical assessments or reports.
Manual
- [ ] My impression is that the manual can be arranged in a better way and perhaps even the functions can be simplified. For instance all
FetchSP*()functions should share the same documentation page. The same applies forCountSP*(). In fact, I would even be more radical and just define two functioncount()andfetch()(names just as examples), where everything else (either looking for species or genera, in title, in keywords, in abstract, in which database, etc.) can be set as argument in ad-hoc function parameters. - [ ] I failed to assign outputs of
Count*()andFetch*()to an object (e.g. byobj <- Fetch*()). I also wonder that any of the examples do assignments in the manual and all functions that are supposed to work on outputs of these functions actually work on data already installed by the package. I know that such kind of queries may fail when building the package but you have the option of writing your examples within a\dontrun{}environment. I will prefer to provide examples to the users starting by the query. - [ ] If addressed all previous issues, installed data will become superfluous.
- [ ] Is there an option for extract the metadata of the queried references, for instance to export it to a BibTeX file?
Manuscript
- [ ] The same comments for Vignette and Manual may be also applied to the manuscript.
- [ ] I would prefer to have a concrete example of a question and apply it from the query to reporting the outcome without using pre-stored data.
General comments
- [ ] I found quite risky the use of objects for the storage of credentials because by sharing a project you may mistakenly share a script or an object revealing your passwords and IDs. An alternative can be to use masks with
tcltkand store the entries in the temporary folder (see this post) or using the packagekeyring(see here).
Checks
- [ ] Current checks retrieves warnings about the use of
:::and::which may be avoided. The second case, as far as I know, may overcome the declaration of dependencies, which is not desirable. - [ ] Warnings regarding
no visible binding for global variable .... In this case I'll recommend to use NULLing (see this example). - [ ] Check and solve all issues retrieved by
goodpractice::gp()
Thanks everyone for your helpful comments! I've since updated the documentation, reorganised the functions, and fixed some of the warnings that were showing up in checks. We're currently working on adding more tests for the API.
In regards to @kamapu 's comments:
- I've replaced all the
Fetch*()andCount*()functions withFetch()andCount()to make things simpler - the error
Bad Request (HTTP 400)is probably a result of your institution not being a subscriber of Elsevier/Scopus (see https://github.com/jessicatytam/specieshindex/issues/46#issue-1029956107) - when you say
obj <- Fetch()andobj <- Count()are not working, do you mean they are returning some sort of error? They are both working for me, do you want to provide a reprex?
Great, @jessicatytam, please let us know when you have made progress on various points raised in the review. Feel free to open an issue for each item if helpful :relaxed:
@kamapu, I think you missed this question :wink::
when you say obj <- Fetch() and obj <- Count() are not working, do you mean they are returning some sort of error? They are both working for me, do you want to provide a reprex?
@jessicatytam, do you have a timeline on when you think you could do the required changes? If you don't think this will be possible anytime soon, we can pause this submission or offer to resubmit later.
Hi @Bisaloo apologies for the holdup! I've been working on multiple projects and getting organised as I'm starting my PhD. We hope to finishing addressing the comments by the end of March. Here's a list of tasks we will be working on:
- [x] Adding API tests
- [x] Reduce code duplication
- [x] Use
facet_wrapin plots - [x] Add more error messages
- [x] Implement a secure way to provide credentials in sessions
Hi @jessicatytam :wave:, could you update us on the current status of this submission please? If you need some more time, it's okay but would you have a rough estimate of when we could expect your changes? If this is going to take a while, you could also withdraw this submission for the time being and resubmit later.
We are flexible. Please let me know what is the most reasonable and convenient option for you.
We have completed all of the tasks and addressed the comments, with the exception of adding tests, it's currently ~50% so we will probably need until the end of May if that's ok.
What would be the ideal % for test coverage?
We have completed all of the tasks and addressed the comments, with the exception of adding tests, it's currently ~50% so we will probably need until the end of May if that's ok.
Sounds good :+1:
What would be the ideal % for test coverage?
There is no hard and fast rule, especially for API packages that are somewhat more difficult to write tests for. Most best practice guides say something like "major functionality is covered by tests". In my experience, around 80-85% is usually a good number to ensure tests cover most sections of your code. Higher is also great but please let me know if you have issues reaching this number.
For what it's worth, a good and lower-effort way to improve your test suite in the future is also to turn each bug (such as the ones discovered in this review) into a test so you can make sure you don't break this again in the future.
We've got through adding lots of tests and almost everything is now tested except for one big issue.
We're dealing with some issues with testing the functions that query an API multiple times. Since there are limitations in the scopus API, we need >10 GET requests in a single function to retrieve all of the data while handling edge cases smoothly. We are using httptest::with_mock_api() to test the API functions currently, but with_mock_api() can only test 1 GET request per test. So to test everything in those functions is difficult given the combination of the constraints of the Scopus API and the mock_api framework. So we're a bit stuck on that unless you can see a clever solution?
Here's the link the codecov: https://app.codecov.io/gh/jessicatytam/specieshindex/blob/master/R/specieshindex.R . Check out the function FetchT_scopus which is being tested but the test exits (successfully) after the first GET call. Any help much appreciated.
@jessicatytam, while investigating your problem, I had another look and I want to already now mention that @DrMattG's point about code duplication is not yet properly addressed in my opinion.
Most functions still include a lot of duplicated code. I have submitted a pull request (https://github.com/jessicatytam/specieshindex/pull/60) where I show how you could reduce code duplication. Please have a look and try to apply this kind of fixes throughout the code. Let me know if it's not clear.
On the topic of duplication, I notice that the documentation of example datasets is duplicated as well. You documented them in each individual files (Koala.R, Platypus.R, etc.) and at the end of the main specieshindex.R file. Could you have a look please?
I would like to emphasize that this insistence on avoiding code duplication is not just to annoy you. Code with high duplication is very hard / impossible to maintain in the long run. If you refactor the code to reduce duplication, future updates will require change in a single place, instead of throughout the codebase.
Thanks for the comments! I'll have another look at the updated code again and come up with a list of things to do and a timeline.
- [ ] unused arguments (https://github.com/jessicatytam/specieshindex/issues/61#issue-1253599082)
- [ ] storing results w/in a loop (https://github.com/jessicatytam/specieshindex/issues/62#issue-1253657243)
- [ ] reduce code duplication (https://github.com/jessicatytam/specieshindex/pull/60#issue-1252587022)
- [x] fix dataset documentation duplication
- [x] replace print() with message() (https://github.com/jessicatytam/specieshindex/issues/64#issue-1271823188)