software-review
                                
                                 software-review copied to clipboard
                                
                                    software-review copied to clipboard
                            
                            
                            
                        npi: Access the U.S. National Provider Identifier Registry API
Reviewers: Submitting Author Name: Frank Farach Submitting Author Github Handle: @frankfarach Other Package Authors Github handles: (comma separated, delete if none) @parmsam Repository: https://github.com/frankfarach/npi Version submitted: 0.1.0 Submission type: Standard Editor: @maelle Reviewers: @Rekyt, @zabore
Due date for @Rekyt: 2022-04-04Due date for @zabore: 2022-04-18
Archive: TBD Version accepted: TBD Language: en
- Paste the full DESCRIPTION file inside a code block below:
Package: npi
Title: Access the U.S. National Provider Identifier Registry API
Version: 0.1.0
Authors@R: c(
    person("Frank", "Farach", , "[email protected]", role = c("cre", "aut"),
           comment = c(ORCID = "0000-0002-2145-0145")),
    person("Sam", "Parmar", , "[email protected]", role = "ctb")
  )
Description: Access the United States National Provider Identifier
    Registry API (if available) and provide informative error messages
    when it's not.
License: MIT + file LICENSE
URL: https://github.com/frankfarach/npi,
    https://frankfarach.github.io/npi/,
    https://npiregistry.cms.hhs.gov/api/
BugReports: https://github.com/frankfarach/npi/issues
Depends: 
    R (>= 3.1)
Imports: 
    checkLuhn,
    dplyr,
    glue,
    httr,
    magrittr,
    purrr,
    rlang,
    stringr,
    tibble,
    tidyr,
    utils
Suggests: 
    checkmate,
    covr,
    httptest,
    knitr,
    mockery,
    rmarkdown,
    spelling,
    testthat (>= 2.1.0)
VignetteBuilder: 
    knitr
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.1.2
Scope
- 
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.): - [x] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
 
- 
Explain how and why the package falls under these categories (briefly, 1-2 sentences): npi is an API wrapper for accessing and retrieving data from the public U.S. National Provider Identifier Registry. 
- 
Who is the target audience and what are scientific applications of this package? Healthcare analysts, public health analysts, health economists, health policy analysts, medical informaticists, and journalists can use data obtained through this package to address a wide range of scientific questions in their respective domains. NPI numbers appear in other publicly available records, making them especially useful for linking datasets (e.g., healthcare quality, Medicare, etc.) in an otherwise fragmented healthcare data landscape. 
- 
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? None that I could find on Metacran when searching for npi or NPPES. 
- 
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? Yes. Although the API, and thus the package, returns personally identifiable information (e.g., names of individual providers), the U.S. government already makes these records freely and publicly available online, and provides data dissemination notices on the data elements to be disclosed from NPI records. 
- 
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted. 
- 
Explain reasons for any pkgcheckitems which your package is unable to pass. The only failing check is for continuous integration. However, the package repo performs CI integration through Github Actions, including standard R-CMD-check, test coverage, pkgdown, and CITATION.cff.
Technical checks
Confirm each of the following by checking the box.
- [x] I have read the guide for authors and rOpenSci packaging guide.
This package:
- [x] does not violate the Terms of Service of any service it interacts with.
- [x] has a CRAN and OSI accepted license.
- [X] contains a README with instructions for installing the development version.
- [x] includes documentation with examples for all functions, created with roxygen2.
- [x] contains a vignette with examples of its essential functions and uses.
- [x] has a test suite.
- [x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.
Publication options
- 
[x] Do you intend for this package to go on CRAN? 
- 
[ ] Do you intend for this package to go on Bioconductor? 
- 
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so: 
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal.
- [ ] The manuscript describing the package is no longer than 3000 words.
- [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)
Code of conduct
- [x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.
:rocket:
Editor check started
:wave:
Checks for npi (v0.1.0)
git hash: db497325
- :heavy_check_mark: Package name is available
- :heavy_check_mark: has a 'CITATION' file.
- :heavy_check_mark: has a 'codemeta.json' file.
- :heavy_check_mark: has a 'contributing' file.
- :heavy_check_mark: uses 'roxygen2'.
- :heavy_check_mark: 'DESCRIPTION' has a URL field.
- :heavy_check_mark: 'DESCRIPTION' has a BugReports field.
- :heavy_check_mark: Package has at least one HTML vignette
- :heavy_check_mark: All functions have examples.
- :heavy_check_mark: Package has continuous integration checks.
- :heavy_check_mark: Package coverage is 93.4%.
- :heavy_check_mark: R CMD check found no errors.
- :heavy_check_mark: R CMD check found no warnings.
Package License: MIT + file LICENSE
1. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in R (100% in 9 files) and
- 1 authors
- 1 vignette
- 1 internal data file
- 11 imported packages
- 8 exported functions (median 10 lines of code)
- 54 non-exported functions in R (median 8 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:
- loc= "Lines of Code"
- fn= "function"
- exp/- not_exp= exported / not exported
The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
| measure | value | percentile | noteworthy | 
|---|---|---|---|
| files_R | 9 | 55.2 | |
| files_vignettes | 1 | 68.4 | |
| files_tests | 6 | 84.4 | |
| loc_R | 398 | 40.6 | |
| loc_vignettes | 54 | 10.1 | |
| loc_tests | 258 | 60.8 | |
| num_vignettes | 1 | 64.8 | |
| data_size_total | 3102 | 64.9 | |
| data_size_median | 3102 | 71.5 | |
| n_fns_r | 62 | 63.3 | |
| n_fns_r_exported | 8 | 38.3 | |
| n_fns_r_not_exported | 54 | 69.6 | |
| n_fns_per_file_r | 6 | 71.6 | |
| num_params_per_fn | 2 | 11.9 | |
| loc_per_fn_r | 9 | 24.3 | |
| loc_per_fn_r_exp | 10 | 22.2 | |
| loc_per_fn_r_not_exp | 8 | 26.4 | |
| rel_whitespace_R | 28 | 55.8 | |
| rel_whitespace_vignettes | 48 | 18.6 | |
| rel_whitespace_tests | 31 | 67.1 | |
| doclines_per_fn_exp | 19 | 12.2 | |
| doclines_per_fn_not_exp | 0 | 0.0 | TRUE | 
| fn_call_network_size | 34 | 58.2 | 
1a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
2. goodpractice and other checks
Details of goodpractice and other checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
| name | conclusion | sha | date | 
|---|---|---|---|
| Commands | skipped | db4973 | 2022-03-04 | 
| pages build and deployment | success | 552287 | 2022-03-04 | 
| pkgdown | success | 5a1981 | 2022-03-04 | 
| R-CMD-check | success | 5a1981 | 2022-03-04 | 
| test-coverage | success | 5a1981 | 2022-03-04 | 
| Update CITATION.cff | success | 5a1981 | 2022-03-04 | 
3b. goodpractice results
R CMD check with rcmdcheck
rcmdcheck found no errors, warnings, or notes
Test coverage with covr
Package coverage: 93.45
Cyclocomplexity with cyclocomp
No functions have cyclocomplexity >= 15
Static code analyses with lintr
lintr found no issues with this package!
Package Versions
| package | version | 
|---|---|
| pkgstats | 0.0.3.88 | 
| pkgcheck | 0.0.2.239 | 
Editor-in-Chief Instructions:
This package is in top shape and may be passed on to a handling editor
Dear @frankfarach, Thank you for your submission. The package appears to be in-scope and in good shape, so I am looking for an editor to handle this submission. Thanks, Julia
Hi @jooolia, Thanks for the update. I look forward to working with the peer review team. Frank
@ropensci-review-bot assign @maelle as editor
Assigned! @maelle is now the editor
:wave: @frankfarach Thanks for your submission! I'll start looking for reviewers, I have just one question and one comment below.
Editor checks:
- [x] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
- [x] Is the case for the package well made?
- [x] Is the reference index page clear (grouped by topic if necessary)?
- [x] Are vignettes readable, sufficiently detailed and not just perfunctory?
 
- [x] Fit: The package meets criteria for fit and overlap.
- [x] Installation instructions: Are installation instructions clear enough for human users?
- [x] Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
- [x] Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
- [x License: The package has a CRAN or OSI accepted license.
- [x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?
Editor comments
- There aren't many functions but you could still add grouping to the reference index so that the most important functions come first.
- I was curious to know why you use mockery::.mockPaths()?
@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[](https://github.com/ropensci/software-review/issues/505)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
@ropensci-review-bot add @Rekyt to reviewers
@Rekyt added to the reviewers list. Review due date is 2022-04-04. Thanks @Rekyt for accepting to review! Please refer to our reviewer guide.
@Rekyt: If you haven't done so, please fill this form for us to update our reviewers records.
Hi @maelle, thanks for the comment and question, which I address below.
Editor comments
- There aren't many functions but you could still add grouping to the reference index so that the most important functions come first.
Thanks for the suggestion. I have created an issue for this.
- I was curious to know why you use
mockery::.mockPaths()?
.mockPaths() is from the httptest package. It lets you set a directory other than the current working directory for recording API fixtures with functions like httptest::with_mock_api(). It looks like I might not need .mockPaths().
Your question made me realize that I am loading both httptest and mockery in tests/testthat/setup.R. I think I am only using functions from httptests in my test suite, so I may not need mockery at all. I've created another issue to check on this as part of my response to peer review.
:wave: @frankfarach! Thank you! I realized afterwards .mockPaths() was from httptest, as I was working on a package that uses httptest. :woman_facepalming: (in that package we use with_mock_dir())
@ropensci-review-bot add @zabore to reviewers
@zabore added to the reviewers list. Review due date is 2022-04-08. Thanks @zabore for accepting to review! Please refer to our reviewer guide.
@zabore: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot set due date for @zabore to 2022-04-18
Review due date for @zabore is now 18-April-2022
Package Review
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
- Briefly describe any working relationship you have (had) with the package authors.
- [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- [x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
- [x] Installation instructions: for the development version of package and any non-standard dependencies in README
- [x] Vignette(s): demonstrating major functionality that runs successfully locally
- [x] Function Documentation: for all exported functions
- [x] Examples: (that run successfully locally) for all exported functions
- [x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL,BugReportsandMaintainer(which may be autogenerated viaAuthors@R).
Functionality
- [x] Installation: Installation succeeds as documented.
- [x] Functionality: Any functional claims of the software been confirmed.
- [x] Performance: Any performance claims of the software been confirmed.
- [x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- [x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
Estimated hours spent reviewing: 3
- [x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
Overall the package is very well designed. It has simple and meaningful functions and hides a quite important machinery to make the results of the NPI API easily accessible and exploitable in R. I would like to congratulate the author for this work.
Regarding code efficiency I didn't found any improvement that could be easily made as I have the impression the code is pretty optimized both from a reading and performance standpoint.
Regarding the documentation, I think it's very well written and clear. It's easy to understand it reading from different entry points. The difficulties I had understanding were rather related to the NPI API that I am not familiar with.
I have however a few comments on how to improve the coverage of the package and found several edge cases not well covered by the package, that could be worth to include as additional tests.
Documentation
API URL
You could indicate API URL in DESCRIPTION file, this is needed for submission to CRAN (if you consider it). It is also possible to specify which link is what in the same file. Like it is done in taxize DESCRIPTION file https://github.com/ropensci/taxize/blob/a4db9a790e48c93235cac6145826c097ec687408/DESCRIPTION#L10-L12
npis description
The description of npis dataset doesn't seem to fit the dataset as it specifies "list of 0-n tibbles" for one column. It is not clear to me how to read it. Though I think it's nice to indicate the type of element. Maybe a way to make it more explicit would be to indicate "list-column of tibbles with X rows and Y columns".
extra vignette: use cases
I have no idea how this API could be useful in a real use case. But it may be nice to have an extra vignette showing how you can answer some questions through this API (how is it useful for research? for health administration?).
Function usage
Rate limitation
Is there a form of rate limitation? It seems reasonable given that we access an API. Nothing is shown from the API standpoint nor the package? (I've seen it mentioned in the "Getting started" vignette but it could be more explicit). If it is the case, that could be a nice thing to mention to the user so that they are aware of the time it would take to get their results back.
Can't use npi_summarize() with country code
I've found an issue when querying by country code which returns slightly differently formatted query and as such renders npi_summarize() unusable.
reprex:
library("npi")
ki = npi_search(country_code = "DE")
#> Requesting records 0-10...
npi_summarize(ki)
#> Error:
#> ! Tibble columns must have compatible sizes.
#> * Size 10: Existing data.
#> * Size 12: Column `primary_taxonomy`.
#> i Only values of size one are recycled.
#> Run `rlang::last_error()` to see where the error occurred.
Maybe npi_summarize() should take into account this edge case (even if I suppose there are not that many American health professionals outside the US).
Error when internet is off
Because this is a recurrent theme for API packages (and CRAN asks them to "fail gracefully" in these cases), I've checked what happens when I turn of the internet and I don't think it is that explicit for the user.
> npi_search(city = "San Francisco")
Requesting records 0-10...
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Could not resolve host: npiregistry.cms.hhs.gov
Maybe there could be a way to display a better error message when internet is off or the website unreachable? Like check first hand if internet is off and errors explicitly if it is the case.
Special characters in npi_search() arguments
Character arguments of npi_search() allow for some special character as specified by the documentation. But when searching with other special characters, the query is still submitted.
reprex:
library("npi")
npi_search(first_name = "KOŒ*")
#> Requesting records 0-10...
#> Error in `npi_handle_response()`:
#> ! 
#> Field: first_name
#> Field contains special character(s) or wrong number of characters
Created on 2022-04-04 by the reprex package (v2.0.1)
It works well but it fallback on the API when this is the case, I think it could be possible to catch these issues early when processing these arguments to save time and additional queries.
Wildcards special cases
Same as above, some special queries with wildcards used improperly (not trailing) could be caught earlier. I had strange results with some.
reprex:
library("npi")
# Trying to mess up with wildcards
ko = npi_search(last_name = "M*ll*")
#> Requesting records 0-10...
ko$basic  # The answer has nothing to do with the pattern
#> [[1]]
#> # A tibble: 1 x 11
#>   first_name last_name middle_name credential sole_proprietor gender
#>   <chr>      <chr>     <chr>       <chr>      <chr>           <chr> 
#> 1 JENNY      ENSTROM   E           PA         NO              F     
#> # ... with 5 more variables: enumeration_date <chr>, last_updated <chr>,
#> #   status <chr>, name <chr>, certification_date <chr>
# Wildcards in the middle do not work
ko = npi_search(last_name = "M*ll")
#> Requesting records 0-10...
ko$basic  # And I get the same answer instead of an error?!
#> [[1]]
#> # A tibble: 1 x 11
#>   first_name last_name middle_name credential sole_proprietor gender
#>   <chr>      <chr>     <chr>       <chr>      <chr>           <chr> 
#> 1 JENNY      ENSTROM   E           PA         NO              F     
#> # ... with 5 more variables: enumeration_date <chr>, last_updated <chr>,
#> #   status <chr>, name <chr>, certification_date <chr>
Created on 2022-04-04 by the reprex package (v2.0.1)
User agent
It's great that the package provides a user-agent. It has been proven to be the best way possible to access an API to provide a proof of who is accessing the API.
In the README there is a mention of the user agent (https://github.com/frankfarach/npi/blob/d4e98a52ccc0a7f71328582333e5b4191f4796b9/README.Rmd#L114-L120). Which seems a great idea. However, this does not consider that the user may not be familiar with the concept of user-agent. Also, the displayed example doesn't show in which way it can be helpful to change its user-agent. Maybe you could go to a more concrete example. I also wonder if it is a good idea to let the user entirely remove the reference to the package.
To improve the user-agent, which only points to the URL of the repo, it would be good to indicate the version npi in use like it is done in taxize. If it is needed for the user to modify its user agent, then maybe providing a fixed part of the UA with the version of the package and a customizable second part would be even more complete.
Code
checkmate
checkmate is in Suggests but is used across several functions and no tests are done if package isn't installed. Maybe a good idea to make an Imports?
tidyr warning
At several occasion we face this message:
Warning message:
The `.sep` argument of `unnest()` is deprecated as of tidyr 1.0.0.
Use `names_sep = '_'` instead.
while there is a function to check for installed tidyr version
tidyr_new_interface().
On my computer I had the error when using npi_flatten() even though I had
tidyr v.1.2.0 installed
Tests
Functions delay_by() and remove_null() are defined but never used (and never tested). Was that planned for another update? Because it artificially reduces the coverage.
No tests have been done with empty results? Certain parameter combinations give back empty results and it could be easy to
cover L177 of npi_search.R https://app.codecov.io/gh/frankfarach/npi/blob/master/R/npi_search.R (for ex. npi_search(last_name = "Grenie");))
npi_flatten() is not covered at all and could be covered by developing tests using the npis dataset.
Hi @Rekyt,
Thank you for your review of the npi package. I really appreciate your kind words about the package and -- even more than that -- your specific recommendations for improving the user experience and the quality of the code.
I look forward to responding to your review in detail.
Hi @maelle, @Rekyt, and @zabore,
As you will see, I have been creating a new issue in the package repo for each substantive item in your reviews. Each issue links back to the originating comment(s) in this review thread and all (and only) such issues are collected in an rOpenSci Review milestone. I plan to link to these issues and any pull requests I make to the package repository in my forthcoming responses.
Although I am doing this primarily to stay organized and preserve the intellectual lineage of future changes to the package, I thought I should explain the appearance of a bunch of issue references in this thread. I also hope it's helpful to you -- but if not, let me know and I'll see if I can remove the backlinks, or at least not link to this thread when @zabore's review arrives.
Here is my review!
- Briefly describe any working relationship you have (had) with the package authors.
- [x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation
The package includes all the following forms of documentation:
- [x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
- [x] Installation instructions: for the development version of package and any non-standard dependencies in README
- [x] Vignette(s): demonstrating major functionality that runs successfully locally
- [x] Function Documentation: for all exported functions
- [x] Examples: (that run successfully locally) for all exported functions
- [x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL,BugReportsandMaintainer(which may be autogenerated viaAuthors@R).
Functionality
- [x] Installation: Installation succeeds as documented.
- [x] Functionality: Any functional claims of the software been confirmed.
- [x] Performance: Any performance claims of the software been confirmed.
- [x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- [x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
Estimated hours spent reviewing: 4
- [x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.
Review Comments
Overall I found this package to be well designed and well documented. Nice work! I understand what it does, but I was left wondering why. It could be nice to include a vignette demonstrating how you envision this being used. How are people getting this information now, and how will this streamline things for them? Below are some more specific comments, but overall I found no major issues.
README
- Overall the README is thorough and well organized and easy to read. Great work!
- Under "Search the registry" section the README mentions 4 regular vector columns that are returned by npi_search()but when I execute the code I getenumeration_typein place of the mentionedprovider_type
- When I ran npi_summarize(nyc)I got the following warning message (Note I have tidyr 1.2.0 installed on my machine):Warning message: The `.sep` argument of `unnest()` is deprecated as of tidyr 1.0.0. Use `names_sep = '_'` instead.
- The example in the README is not reproducible, meaning I got different results when I ran the code than what was shown in the README output. Also, the ouput included in the vignette differ from those in the README. I assume this is due to a change in the NPI itself between when the README was created and when I ran the code? If possible, reproducible example in the README is ideal, though I understand that may be a challenge with a changing API.
vignette
- In the "Search registry" section I would prefer to see the other search arguments listed by the argument name rather than a description (i.e. to see first_namerather than "provider's first name") as this
- More details could be provided about what npi_summarize()does. Instead of saying it provides a more "human-readable" overview you could instead just state explicitly that it extract the first and last name, primary taxonomy, and primary address and phone number for each provider.
- The README contains slightly more detailed information about what npi_flatten()does than the vignette. For users familiar withpurrr::flatten()the action may be somewhat self explanatory. But for other users the term "flatten" likely has no meaning, so a more functional description could be useful.
Documentation/Examples
- Great use of npi_to start all exported functions, makes them easy to use and get through autofill.
- It's not clear to me how or when npi_is_valid()would be used. Could provide some motivation or an actual use case in the README and/or vignette.
- In the documentation for the npisdataset you mention in several places "[list of 0-n tibbles]" but there is no definition of n. I assume this indicates the number of tibbles in the list can vary, but I don't think it's necessary to include this. "List of tibbles" could be sufficient.
- Is there documentation anywhere for what information is contained in the columns of the tibble returned by npi_search()? I couldn't find it.
- The help page (or vignette?) for npi_search()could provide more details for how matching is done by default. For example, I get different results if I search for city = "new york" versus city = "new york city". I know you have not programmed the name matching yourself, but I think this is important information for the user to have so at least pointing to a link where this information can be found could be useful.
- It could be useful to add a message from npi_search()that explicitly states something along the lines of "Returning 10 of xxx records for the specified search parameters" so that the user is aware they are getting only a subset of the records that match their search parameters.
- Why is 1200 the maximum for the limitargument innpi_search()?
Tests
- Look thorough to me.
Functionality/performance
- No performance issues or areas for improvement that I can see
- Your error messages are quite good, providing useful information about where errors in supplied arguments occurred and what is allowed. This is really user friendly!
- It seems like when invalid arguments are passed to npi_search()it still tries to query the API first and then returns an error. Internal checks could be done in this function to avoid the query altogether in these cases and to more quickly return a useful error message.
- I see that the taxonomies field in the result of npi_search()has a "primary" column, which I assume is used bynpi_summarize()to identify the primary taxonomy. However, how doesnpi_summarize()determine the "primary" address and phone number when there are more than one listed, since there is no corresponding "primary" column in addresses? Is it always just the first record from taxonomies and address?
- Does npi_flatten()always return a sensible result? It seems like it merges together a lot of different pieces of information from different list columns but does it always make sense to cross-tabulate every single one? I don't have an idea of where it might go wrong, but always wonder when merging tibbles with different numbers of rows.
Thanks a ton @Rekyt & @zabore for your thorough reviews! @zabore, would you mind adding the time spent reviewing to your comment after "Estimated hours spent reviewing: "? Thank you!
@frankfarach I think the backlinks & milestone are great. We will still expect your response to be more or less self-explanatory so not just a list of links (I'm not assuming you got this wrong, just stating this just in case!).
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1087579526 time 3
Logged review for Rekyt (hours: 3)
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1090531882= time NA
Logged review for zabore (hours: NA)
@maelle I added the time, sorry about the initial oversight!