software-review npi: Access the U.S. National Provider Identifier Registry API

trafficstars

Reviewers: Submitting Author Name: Frank Farach Submitting Author Github Handle: @frankfarach Other Package Authors Github handles: (comma separated, delete if none) @parmsam Repository: https://github.com/frankfarach/npi Version submitted: 0.1.0 Submission type: Standard Editor: @maelle Reviewers: @Rekyt, @zabore

Due date for @Rekyt: 2022-04-04

Due date for @zabore: 2022-04-18

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: npi
Title: Access the U.S. National Provider Identifier Registry API
Version: 0.1.0
Authors@R: c(
    person("Frank", "Farach", , "[email protected]", role = c("cre", "aut"),
           comment = c(ORCID = "0000-0002-2145-0145")),
    person("Sam", "Parmar", , "[email protected]", role = "ctb")
  )
Description: Access the United States National Provider Identifier
    Registry API (if available) and provide informative error messages
    when it's not.
License: MIT + file LICENSE
URL: https://github.com/frankfarach/npi,
    https://frankfarach.github.io/npi/,
    https://npiregistry.cms.hhs.gov/api/
BugReports: https://github.com/frankfarach/npi/issues
Depends: 
    R (>= 3.1)
Imports: 
    checkLuhn,
    dplyr,
    glue,
    httr,
    magrittr,
    purrr,
    rlang,
    stringr,
    tibble,
    tidyr,
    utils
Suggests: 
    checkmate,
    covr,
    httptest,
    knitr,
    mockery,
    rmarkdown,
    spelling,
    testthat (>= 2.1.0)
VignetteBuilder: 
    knitr
Encoding: UTF-8
Language: en-US
LazyData: true
RoxygenNote: 7.1.2

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [x] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences): npi is an API wrapper for accessing and retrieving data from the public U.S. National Provider Identifier Registry.
Who is the target audience and what are scientific applications of this package? Healthcare analysts, public health analysts, health economists, health policy analysts, medical informaticists, and journalists can use data obtained through this package to address a wide range of scientific questions in their respective domains. NPI numbers appear in other publicly available records, making them especially useful for linking datasets (e.g., healthcare quality, Medicare, etc.) in an otherwise fragmented healthcare data landscape.
Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category? None that I could find on Metacran when searching for npi or NPPES.
(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research? Yes. Although the API, and thus the package, returns personally identifiable information (e.g., names of individual providers), the U.S. government already makes these records freely and publicly available online, and provides data dissemination notices on the data elements to be disclosed from NPI records.
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass. The only failing check is for continuous integration. However, the package repo performs CI integration through Github Actions, including standard R-CMD-check, test coverage, pkgdown, and CITATION.cff.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the guide for authors and rOpenSci packaging guide.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[X] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

[ ] The package is novel and will be of interest to the broad readership of the journal.
[ ] The manuscript describing the package is no longer than 3000 words.
[ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Mar 04 '22 08:03 frankfarach

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

Mar 04 '22 08:03 ropensci-review-bot

:rocket:

Editor check started

:wave:

Mar 04 '22 08:03 ropensci-review-bot

Checks for npi (v0.1.0)

git hash: db497325

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'CITATION' file.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 93.4%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in R (100% in 9 files) and
1 authors
1 vignette
1 internal data file
11 imported packages
8 exported functions (median 10 lines of code)
54 non-exported functions in R (median 8 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	9	55.2
files_vignettes	1	68.4
files_tests	6	84.4
loc_R	398	40.6
loc_vignettes	54	10.1
loc_tests	258	60.8
num_vignettes	1	64.8
data_size_total	3102	64.9
data_size_median	3102	71.5
n_fns_r	62	63.3
n_fns_r_exported	8	38.3
n_fns_r_not_exported	54	69.6
n_fns_per_file_r	6	71.6
num_params_per_fn	2	11.9
loc_per_fn_r	9	24.3
loc_per_fn_r_exp	10	22.2
loc_per_fn_r_not_exp	8	26.4
rel_whitespace_R	28	55.8
rel_whitespace_vignettes	48	18.6
rel_whitespace_tests	31	67.1
doclines_per_fn_exp	19	12.2
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	34	58.2

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

3a. Continuous Integration Badges

GitHub Workflow Results

name	conclusion	sha	date
Commands	skipped	db4973	2022-03-04
pages build and deployment	success	552287	2022-03-04
pkgdown	success	5a1981	2022-03-04
R-CMD-check	success	5a1981	2022-03-04
test-coverage	success	5a1981	2022-03-04
Update CITATION.cff	success	5a1981	2022-03-04

3b. `goodpractice` results

`R CMD check` with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

Package coverage: 93.45

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found no issues with this package!

Package Versions

package	version
pkgstats	0.0.3.88
pkgcheck	0.0.2.239

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

Mar 04 '22 08:03 ropensci-review-bot

Dear @frankfarach, Thank you for your submission. The package appears to be in-scope and in good shape, so I am looking for an editor to handle this submission. Thanks, Julia

Mar 11 '22 08:03 jooolia

Hi @jooolia, Thanks for the update. I look forward to working with the peer review team. Frank

Mar 12 '22 04:03 frankfarach

@ropensci-review-bot assign @maelle as editor

Mar 13 '22 20:03 jooolia

Assigned! @maelle is now the editor

Mar 13 '22 20:03 ropensci-review-bot

:wave: @frankfarach Thanks for your submission! I'll start looking for reviewers, I have just one question and one comment below.

Editor checks:

[x] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
- [x] Is the case for the package well made?
- [x] Is the reference index page clear (grouped by topic if necessary)?
- [x] Are vignettes readable, sufficiently detailed and not just perfunctory?
[x] Fit: The package meets criteria for fit and overlap.
[x] Installation instructions: Are installation instructions clear enough for human users?
[x] Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
[x] Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
[x License: The package has a CRAN or OSI accepted license.
[x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

There aren't many functions but you could still add grouping to the reference index so that the most important functions come first.
I was curious to know why you use mockery::.mockPaths()?

Mar 14 '22 07:03 maelle

@ropensci-review-bot seeking reviewers

Mar 14 '22 07:03 maelle

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/505_status.svg)](https://github.com/ropensci/software-review/issues/505)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

Mar 14 '22 07:03 ropensci-review-bot

@ropensci-review-bot add @Rekyt to reviewers

Mar 14 '22 07:03 maelle

@Rekyt added to the reviewers list. Review due date is 2022-04-04. Thanks @Rekyt for accepting to review! Please refer to our reviewer guide.

Mar 14 '22 07:03 ropensci-review-bot

@Rekyt: If you haven't done so, please fill this form for us to update our reviewers records.

Mar 14 '22 07:03 ropensci-review-bot

Hi @maelle, thanks for the comment and question, which I address below.

Editor comments

There aren't many functions but you could still add grouping to the reference index so that the most important functions come first.

Thanks for the suggestion. I have created an issue for this.

I was curious to know why you use mockery::.mockPaths()?

.mockPaths() is from the httptest package. It lets you set a directory other than the current working directory for recording API fixtures with functions like httptest::with_mock_api(). It looks like I might not need .mockPaths().

Your question made me realize that I am loading both httptest and mockery in tests/testthat/setup.R. I think I am only using functions from httptests in my test suite, so I may not need mockery at all. I've created another issue to check on this as part of my response to peer review.

Mar 14 '22 15:03 frankfarach

:wave: @frankfarach! Thank you! I realized afterwards .mockPaths() was from httptest, as I was working on a package that uses httptest. :woman_facepalming: (in that package we use with_mock_dir())

Mar 15 '22 06:03 maelle

@ropensci-review-bot add @zabore to reviewers

Mar 18 '22 14:03 maelle

@zabore added to the reviewers list. Review due date is 2022-04-08. Thanks @zabore for accepting to review! Please refer to our reviewer guide.

Mar 18 '22 14:03 ropensci-review-bot

@zabore: If you haven't done so, please fill this form for us to update our reviewers records.

Mar 18 '22 14:03 ropensci-review-bot

@ropensci-review-bot set due date for @zabore to 2022-04-18

Mar 18 '22 14:03 maelle

Review due date for @zabore is now 18-April-2022

Mar 18 '22 14:03 ropensci-review-bot

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 3

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Overall the package is very well designed. It has simple and meaningful functions and hides a quite important machinery to make the results of the NPI API easily accessible and exploitable in R. I would like to congratulate the author for this work.

Regarding code efficiency I didn't found any improvement that could be easily made as I have the impression the code is pretty optimized both from a reading and performance standpoint.

Regarding the documentation, I think it's very well written and clear. It's easy to understand it reading from different entry points. The difficulties I had understanding were rather related to the NPI API that I am not familiar with.

I have however a few comments on how to improve the coverage of the package and found several edge cases not well covered by the package, that could be worth to include as additional tests.

Documentation

API URL

You could indicate API URL in DESCRIPTION file, this is needed for submission to CRAN (if you consider it). It is also possible to specify which link is what in the same file. Like it is done in taxize DESCRIPTION file https://github.com/ropensci/taxize/blob/a4db9a790e48c93235cac6145826c097ec687408/DESCRIPTION#L10-L12

npis description

The description of npis dataset doesn't seem to fit the dataset as it specifies "list of 0-n tibbles" for one column. It is not clear to me how to read it. Though I think it's nice to indicate the type of element. Maybe a way to make it more explicit would be to indicate "list-column of tibbles with X rows and Y columns".

extra vignette: use cases

I have no idea how this API could be useful in a real use case. But it may be nice to have an extra vignette showing how you can answer some questions through this API (how is it useful for research? for health administration?).

Function usage

Rate limitation

Is there a form of rate limitation? It seems reasonable given that we access an API. Nothing is shown from the API standpoint nor the package? (I've seen it mentioned in the "Getting started" vignette but it could be more explicit). If it is the case, that could be a nice thing to mention to the user so that they are aware of the time it would take to get their results back.

Can't use npi_summarize() with country code

I've found an issue when querying by country code which returns slightly differently formatted query and as such renders npi_summarize() unusable.

reprex:

library("npi")
ki = npi_search(country_code = "DE")
#> Requesting records 0-10...
npi_summarize(ki)
#> Error:
#> ! Tibble columns must have compatible sizes.
#> * Size 10: Existing data.
#> * Size 12: Column `primary_taxonomy`.
#> i Only values of size one are recycled.
#> Run `rlang::last_error()` to see where the error occurred.

Maybe npi_summarize() should take into account this edge case (even if I suppose there are not that many American health professionals outside the US).

Error when internet is off

Because this is a recurrent theme for API packages (and CRAN asks them to "fail gracefully" in these cases), I've checked what happens when I turn of the internet and I don't think it is that explicit for the user.

> npi_search(city = "San Francisco")
Requesting records 0-10...
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Could not resolve host: npiregistry.cms.hhs.gov

Maybe there could be a way to display a better error message when internet is off or the website unreachable? Like check first hand if internet is off and errors explicitly if it is the case.

Special characters in npi_search() arguments

Character arguments of npi_search() allow for some special character as specified by the documentation. But when searching with other special characters, the query is still submitted.

reprex:

library("npi")
npi_search(first_name = "KOŒ*")
#> Requesting records 0-10...
#> Error in `npi_handle_response()`:
#> ! 
#> Field: first_name
#> Field contains special character(s) or wrong number of characters

^{Created on 2022-04-04 by the reprex package (v2.0.1)}

It works well but it fallback on the API when this is the case, I think it could be possible to catch these issues early when processing these arguments to save time and additional queries.

Wildcards special cases

Same as above, some special queries with wildcards used improperly (not trailing) could be caught earlier. I had strange results with some.

reprex:

library("npi")

# Trying to mess up with wildcards
ko = npi_search(last_name = "M*ll*")
#> Requesting records 0-10...
ko$basic  # The answer has nothing to do with the pattern
#> [[1]]
#> # A tibble: 1 x 11
#>   first_name last_name middle_name credential sole_proprietor gender
#>   <chr>      <chr>     <chr>       <chr>      <chr>           <chr> 
#> 1 JENNY      ENSTROM   E           PA         NO              F     
#> # ... with 5 more variables: enumeration_date <chr>, last_updated <chr>,
#> #   status <chr>, name <chr>, certification_date <chr>

# Wildcards in the middle do not work
ko = npi_search(last_name = "M*ll")
#> Requesting records 0-10...
ko$basic  # And I get the same answer instead of an error?!
#> [[1]]
#> # A tibble: 1 x 11
#>   first_name last_name middle_name credential sole_proprietor gender
#>   <chr>      <chr>     <chr>       <chr>      <chr>           <chr> 
#> 1 JENNY      ENSTROM   E           PA         NO              F     
#> # ... with 5 more variables: enumeration_date <chr>, last_updated <chr>,
#> #   status <chr>, name <chr>, certification_date <chr>

^{Created on 2022-04-04 by the reprex package (v2.0.1)}

User agent

It's great that the package provides a user-agent. It has been proven to be the best way possible to access an API to provide a proof of who is accessing the API.

In the README there is a mention of the user agent (https://github.com/frankfarach/npi/blob/d4e98a52ccc0a7f71328582333e5b4191f4796b9/README.Rmd#L114-L120). Which seems a great idea. However, this does not consider that the user may not be familiar with the concept of user-agent. Also, the displayed example doesn't show in which way it can be helpful to change its user-agent. Maybe you could go to a more concrete example. I also wonder if it is a good idea to let the user entirely remove the reference to the package.

To improve the user-agent, which only points to the URL of the repo, it would be good to indicate the version npi in use like it is done in taxize. If it is needed for the user to modify its user agent, then maybe providing a fixed part of the UA with the version of the package and a customizable second part would be even more complete.

Code

checkmate

checkmate is in Suggests but is used across several functions and no tests are done if package isn't installed. Maybe a good idea to make an Imports?

tidyr warning

At several occasion we face this message:

Warning message:
The `.sep` argument of `unnest()` is deprecated as of tidyr 1.0.0.
Use `names_sep = '_'` instead.

while there is a function to check for installed tidyr version tidyr_new_interface(). On my computer I had the error when using npi_flatten() even though I had tidyr v.1.2.0 installed

Tests

Functions delay_by() and remove_null() are defined but never used (and never tested). Was that planned for another update? Because it artificially reduces the coverage.

No tests have been done with empty results? Certain parameter combinations give back empty results and it could be easy to cover L177 of npi_search.R https://app.codecov.io/gh/frankfarach/npi/blob/master/R/npi_search.R (for ex. npi_search(last_name = "Grenie");))

npi_flatten() is not covered at all and could be covered by developing tests using the npis dataset.

Apr 04 '22 13:04 Rekyt

Hi @Rekyt,

Thank you for your review of the npi package. I really appreciate your kind words about the package and -- even more than that -- your specific recommendations for improving the user experience and the quality of the code.

I look forward to responding to your review in detail.

Apr 04 '22 15:04 frankfarach

Hi @maelle, @Rekyt, and @zabore,

As you will see, I have been creating a new issue in the package repo for each substantive item in your reviews. Each issue links back to the originating comment(s) in this review thread and all (and only) such issues are collected in an rOpenSci Review milestone. I plan to link to these issues and any pull requests I make to the package repository in my forthcoming responses.

Although I am doing this primarily to stay organized and preserve the intellectual lineage of future changes to the package, I thought I should explain the appearance of a bunch of issue references in this thread. I also hope it's helpful to you -- but if not, let me know and I'll see if I can remove the backlinks, or at least not link to this thread when @zabore's review arrives.

Apr 04 '22 16:04 frankfarach

Here is my review!

Briefly describe any working relationship you have (had) with the package authors.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 4

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Overall I found this package to be well designed and well documented. Nice work! I understand what it does, but I was left wondering why. It could be nice to include a vignette demonstrating how you envision this being used. How are people getting this information now, and how will this streamline things for them? Below are some more specific comments, but overall I found no major issues.

README

Overall the README is thorough and well organized and easy to read. Great work!
Under "Search the registry" section the README mentions 4 regular vector columns that are returned by npi_search() but when I execute the code I get enumeration_type in place of the mentioned provider_type

When I ran npi_summarize(nyc) I got the following warning message (Note I have tidyr 1.2.0 installed on my machine):

Warning message:
The `.sep` argument of `unnest()` is deprecated as of tidyr 1.0.0.
Use `names_sep = '_'` instead.

The example in the README is not reproducible, meaning I got different results when I ran the code than what was shown in the README output. Also, the ouput included in the vignette differ from those in the README. I assume this is due to a change in the NPI itself between when the README was created and when I ran the code? If possible, reproducible example in the README is ideal, though I understand that may be a challenge with a changing API.

vignette

In the "Search registry" section I would prefer to see the other search arguments listed by the argument name rather than a description (i.e. to see first_name rather than "provider's first name") as this
More details could be provided about what npi_summarize() does. Instead of saying it provides a more "human-readable" overview you could instead just state explicitly that it extract the first and last name, primary taxonomy, and primary address and phone number for each provider.
The README contains slightly more detailed information about what npi_flatten() does than the vignette. For users familiar with purrr::flatten() the action may be somewhat self explanatory. But for other users the term "flatten" likely has no meaning, so a more functional description could be useful.

Documentation/Examples

Great use of npi_ to start all exported functions, makes them easy to use and get through autofill.
It's not clear to me how or when npi_is_valid() would be used. Could provide some motivation or an actual use case in the README and/or vignette.
In the documentation for the npis dataset you mention in several places "[list of 0-n tibbles]" but there is no definition of n. I assume this indicates the number of tibbles in the list can vary, but I don't think it's necessary to include this. "List of tibbles" could be sufficient.
Is there documentation anywhere for what information is contained in the columns of the tibble returned by npi_search()? I couldn't find it.
The help page (or vignette?) for npi_search() could provide more details for how matching is done by default. For example, I get different results if I search for city = "new york" versus city = "new york city". I know you have not programmed the name matching yourself, but I think this is important information for the user to have so at least pointing to a link where this information can be found could be useful.
It could be useful to add a message from npi_search() that explicitly states something along the lines of "Returning 10 of xxx records for the specified search parameters" so that the user is aware they are getting only a subset of the records that match their search parameters.
Why is 1200 the maximum for the limit argument in npi_search()?

Tests

Look thorough to me.

Functionality/performance

No performance issues or areas for improvement that I can see
Your error messages are quite good, providing useful information about where errors in supplied arguments occurred and what is allowed. This is really user friendly!
It seems like when invalid arguments are passed to npi_search() it still tries to query the API first and then returns an error. Internal checks could be done in this function to avoid the query altogether in these cases and to more quickly return a useful error message.
I see that the taxonomies field in the result of npi_search() has a "primary" column, which I assume is used by npi_summarize() to identify the primary taxonomy. However, how does npi_summarize() determine the "primary" address and phone number when there are more than one listed, since there is no corresponding "primary" column in addresses? Is it always just the first record from taxonomies and address?
Does npi_flatten() always return a sensible result? It seems like it merges together a lot of different pieces of information from different list columns but does it always make sense to cross-tabulate every single one? I don't have an idea of where it might go wrong, but always wonder when merging tibbles with different numbers of rows.

Apr 06 '22 17:04 zabore

Thanks a ton @Rekyt & @zabore for your thorough reviews! @zabore, would you mind adding the time spent reviewing to your comment after "Estimated hours spent reviewing: "? Thank you!

@frankfarach I think the backlinks & milestone are great. We will still expect your response to be more or less self-explanatory so not just a list of links (I'm not assuming you got this wrong, just stating this just in case!).

Apr 07 '22 06:04 maelle

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1087579526 time 3

Apr 07 '22 06:04 maelle

Logged review for Rekyt (hours: 3)

Apr 07 '22 06:04 ropensci-review-bot

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/505#issuecomment-1090531882= time NA

Apr 07 '22 12:04 maelle

Logged review for zabore (hours: NA)

Apr 07 '22 12:04 ropensci-review-bot

@maelle I added the time, sorry about the initial oversight!

Apr 07 '22 12:04 zabore

software-review software-review copied to clipboard

npi: Access the U.S. National Provider Identifier Registry API

Archive: TBD Version accepted: TBD Language: en

Scope

Technical checks

Publication options

Code of conduct

Checks for npi (v0.1.0)

1. Statistical Properties

1a. Network visualisation

2. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

Editor-in-Chief Instructions:

Editor checks:

Editor comments

Editor comments

Package Review

Documentation

Functionality

Review Comments

Documentation

Function usage

User agent

Code

Tests

Documentation

Functionality

Review Comments

README

vignette

Documentation/Examples

Tests

Functionality/performance

software-review
software-review copied to clipboard

2. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck