software-review
software-review copied to clipboard
qualtdict: Generating Variable Dictionaries and Labelled Data Exports of Qualtrics Surveys
Submitting Author Name: Yuhao Lin Submitting Author Github Handle: @lyh970817 Repository: https://github.com/lyh970817/qualtdict Version submitted: 0.0.0.9000 Submission type: Standard Editor: @maurolepore Reviewers: TBD
Archive: TBD Version accepted: TBD Language: en
- Paste the full DESCRIPTION file inside a code block below:
Package: qualtdict
Title: Generating Variable Dictionaries and Labelled Data Exports of Qualtrics
Surveys
Version: 0.0.0.9000
Authors@R:
person("Yuhao", "Lin", , "[email protected]", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-6357-5731"))
Description: Provides functions that generate variable dictionaries from
'Qualtrics' <https://www.qualtrics.com/about/> surveys and labelled
survey data based on the dictionary. This package is built upon the R
package 'qualtRics' <https://github.com/ropensci/qualtRics/> which
provides access to 'Qualtrics' survey data and metadata via the 'Qualtrics' API
<https://api.qualtrics.com/>.
License: MIT + file LICENSE
URL: https://github.com/lyh970817/qualtdict
BugReports: https://github.com/lyh970817/qualtdict/issues
Imports:
crul,
dplyr,
glue,
haven,
magrittr,
openNLP,
purrr,
qualtRics,
rlang,
sjlabelled,
slowraker,
SnowballC,
stringi,
stringr,
tibble,
tidyr,
xml2
Suggests:
covr,
knitr,
rmarkdown,
testthat (>= 3.0.0),
vcr (>= 0.6.0)
VignetteBuilder:
knitr
Config/testthat/edition: 3
Config/testthat/start-first: dict_generate, dict_validate, get_survey_data
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Scope
-
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [ ] data retrieval
- [ ] data extraction
- [x] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
-
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Qualtrics is an online survey and data collection software platform. While the qualtRics R package implements data retrieval from the Qualtrics platform, this package 'qualtdict' further processes its output to generate variable dictionaries and labelled data designed to be used for data analyses directly.
- Who is the target audience and what are scientific applications of this package?
The target audience is those who use the Qualtrics survey platform to collect data. This package generates variable dictionaries and labelled data designed to be used for data analyses directly.
- Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
No, but there is the similar qualtRics R package that retrieves a broader range of data from Qualtrics than this package utilises. The output formats from qualtRics are much less user-friendly, for example, it retrieves survey metadata in a nested-list, json-like format, while this package rearranges essential parts of this metadata (retrieved using quatRics) into a publishable variable dictionary in a table format that can be visually inspected in, for example, excel.
- (If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
Yes.
-
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
-
Explain reasons for any
pkgcheckitems which your package is unable to pass.
Technical checks
Confirm each of the following by checking the box.
- [x] I have read the rOpenSci packaging guide.
- [x] I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.
This package:
- [x] does not violate the Terms of Service of any service it interacts with.
- [x] has a CRAN and OSI accepted license.
- [x] contains a README with instructions for installing the development version.
- [x] includes documentation with examples for all functions, created with roxygen2.
- [x] contains a vignette with examples of its essential functions and uses.
- [x] has a test suite.
- [x] has continuous integration, including reporting of test coverage.
Publication options
-
[x] Do you intend for this package to go on CRAN?
-
[ ] Do you intend for this package to go on Bioconductor?
-
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [ ] The package is novel and will be of interest to the broad readership of the journal.
- [ ] The manuscript describing the package is no longer than 3000 words.
- [ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)
Code of conduct
- [x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.
:rocket:
Editor check started
:wave:
Checks for qualtdict (v0.0.0.9000)
git hash: d31c0887
- :heavy_check_mark: Package name is available
- :heavy_check_mark: has a 'codemeta.json' file.
- :heavy_check_mark: has a 'contributing' file.
- :heavy_check_mark: uses 'roxygen2'.
- :heavy_check_mark: 'DESCRIPTION' has a URL field.
- :heavy_check_mark: 'DESCRIPTION' has a BugReports field.
- :heavy_check_mark: Package has at least one HTML vignette
- :heavy_check_mark: All functions have examples.
- :heavy_check_mark: Package has continuous integration checks.
- :heavy_check_mark: Package coverage is 86%.
- :heavy_check_mark: R CMD check found no errors.
- :heavy_check_mark: R CMD check found no warnings.
Package License: MIT + file LICENSE
1. Package Dependencies
Details of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
| type | package | ncalls |
|---|---|---|
| internal | base | 179 |
| internal | qualtdict | 118 |
| internal | utils | 5 |
| internal | stats | 1 |
| imports | magrittr | 70 |
| imports | rlang | 8 |
| imports | glue | 7 |
| imports | qualtRics | 3 |
| imports | tibble | 3 |
| imports | openNLP | 2 |
| imports | sjlabelled | 2 |
| imports | xml2 | 2 |
| imports | stringi | 1 |
| imports | tidyr | 1 |
| imports | crul | NA |
| imports | dplyr | NA |
| imports | haven | NA |
| imports | purrr | NA |
| imports | slowraker | NA |
| imports | SnowballC | NA |
| imports | stringr | NA |
| suggests | covr | NA |
| suggests | knitr | NA |
| suggests | rmarkdown | NA |
| suggests | testthat | NA |
| suggests | vcr | NA |
| linking_to | NA | NA |
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.
base
list (66), length (9), names (7), c (6), unique (6), unlist (6), args (4), ifelse (4), is.null (4), max (4), min (4), paste0 (4), all (3), is.na (3), rownames (3), as.matrix (2), colnames (2), factor (2), for (2), grep (2), is.character (2), levels (2), seq_along (2), split (2), structure (2), table (2), vapply (2), which (2), any (1), as.logical (1), character (1), class (1), data.frame (1), do.call (1), if (1), is.function (1), is.logical (1), labels (1), lapply (1), mode (1), numeric (1), q (1), readRDS (1), return (1), sum (1), suppressWarnings (1), tempdir (1), vector (1)
qualtdict
item_or_level_qid (10), rep_level_qid (10), suf_level_qid (9), null_na (7), not_applicable_qid (6), questiontext_qid (6), suf_item_rep_level_qid (6), suf_item_suf_level_qid (6), collapse (5), file_upload_qid (5), rep_level (3), retry (3), calc_keyword_scores (2), check_item (2), check_json (2), check_names (2), easyname_gen (2), label_to_sfx (2), paste_narm (2), qid_recode (2), recode_json (2), rep_item (2), sbs_qid (2), suf_level_suf_item_qid (2), suf_text_qid (2), timing_qid (2), add_text (1), add_text_mc (1), checkarg_isfunction (1), checkarg_isname (1), checkarg_isqualtdict (1), convert_html (1), dict_generate (1), dict_validate (1), get_survey_data (1), is_onetoone (1), order_name (1), suf_nmlabel_qid (1), text (1), which_not_onetoone (1)
magrittr
%>% (70)
rlang
abort (7), hash (1)
glue
glue (7)
utils
txtProgressBar (4), getFromNamespace (1)
qualtRics
fetch_description (1), fetch_survey (1), metadata (1)
tibble
tibble (2), enframe (1)
openNLP
Maxent_POS_Tag_Annotator (1), Maxent_Word_Token_Annotator (1)
sjlabelled
set_label (1), set_labels (1)
xml2
read_html (1), xml_text (1)
stats
setNames (1)
stringi
stri_count_words (1)
tidyr
unite (1)
NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.
2. Statistical Properties
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
Details of statistical properties (click to open)
The package has:
- code in R (100% in 10 files) and
- 1 authors
- 1 vignette
- no internal data file
- 17 imported packages
- 3 exported functions (median 25 lines of code)
- 110 non-exported functions in R (median 10 lines of code)
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:
loc= "Lines of Code"fn= "function"exp/not_exp= exported / not exported
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function
The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.
| measure | value | percentile | noteworthy |
|---|---|---|---|
| files_R | 10 | 59.0 | |
| files_vignettes | 1 | 68.4 | |
| files_tests | 7 | 86.4 | |
| loc_R | 1152 | 71.7 | |
| loc_vignettes | 118 | 30.8 | |
| loc_tests | 1014 | 87.2 | |
| num_vignettes | 1 | 64.8 | |
| n_fns_r | 113 | 79.3 | |
| n_fns_r_exported | 3 | 12.9 | |
| n_fns_r_not_exported | 110 | 85.5 | |
| n_fns_per_file_r | 6 | 75.4 | |
| num_params_per_fn | 5 | 69.6 | |
| loc_per_fn_r | 11 | 32.3 | |
| loc_per_fn_r_exp | 25 | 55.9 | |
| loc_per_fn_r_not_exp | 10 | 31.3 | |
| rel_whitespace_R | 17 | 70.0 | |
| rel_whitespace_vignettes | 25 | 21.4 | |
| rel_whitespace_tests | 1 | 14.7 | |
| doclines_per_fn_exp | 43 | 54.1 | |
| doclines_per_fn_not_exp | 0 | 0.0 | TRUE |
| fn_call_network_size | 57 | 69.0 |
2a. Network visualisation
Click to see the interactive network visualisation of calls between objects in package
3. goodpractice and other checks
Details of goodpractice checks (click to open)
3a. Continuous Integration Badges
GitHub Workflow Results
| id | name | conclusion | sha | run_number | date |
|---|---|---|---|---|---|
| 4076045888 | R-CMD-check | success | d31c08 | 11 | 2023-02-02 |
| 4076045893 | test-coverage | success | d31c08 | 11 | 2023-02-02 |
3b. goodpractice results
R CMD check with rcmdcheck
R CMD check generated the following check_fail:
- no_import_package_as_a_whole
Test coverage with covr
Package coverage: 85.98
Cyclocomplexity with cyclocomp
No functions have cyclocomplexity >= 15
Static code analyses with lintr
lintr found the following 1 potential issues:
| message | number of times |
|---|---|
| Avoid library() and require() calls in packages | 1 |
Package Versions
| package | version |
|---|---|
| pkgstats | 0.1.3 |
| pkgcheck | 0.1.1.11 |
Editor-in-Chief Instructions:
This package is in top shape and may be passed on to a handling editor
Dear @lyh970817, FYI I'm still searching for a handling editor. It shouldn't take much longer. Thanks for your patience.
Dear @lyh970817, FYI I'm still searching for a handling editor. It shouldn't take much longer. Thanks for your patience.
Thank you so much!
@ropensci-review-bot assign @maurolepore as editor
Assigned! @maurolepore is now the editor
Dear @lyh970817 I'm delighted to announce that I'll be the handling editor of this submission.
Semantic tags for my comments
To help you track my comments I tagged them with "ml" and numbered sequentially: ml01, ml02, and so on. Comments following bullets are for you to consider -- you may or may not respond to them. Comments following check-boxes are requests for some action -- please respond.
Reviewers
- [x] ml01. Can you please suggest three reviewers? Following our guidelines I'll use one at most, but I would like your view of the types of expertise needed to review qualtdict.
Checks
Here I list a few things that caught my attention. They are not blockers but the sooner we address them the better.
Package Dependencies
- ml02. Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.
goodpractice and other checks
- ml03. R CMD check generated the following check_fail: no_import_package_as_a_whole
- ml04. Avoid library() and require() calls in packages: 1 time
Thank you so much for taking time to review this. These are my responses.
ml01. Unfortunately I'm not sure if I could name any specific authors. But expertise-wise I thought having someone with a psychology/social science background might be helpful. As qualtdict is centred around creating a variable dictionary giving an intuitive overview of survey data for analysts. The usefulness of such a dictionary is probably best judged by someone who analyses such data on a daily basis (in contrast to a data engineer who implements APIs for such data).
ml02. R CMD Check seems to fail without importing some of the packages that I don't actually use. For instance, without importing haven:
Error in `set_labels_helper(x = .dat, labels = labels, force.labels = forc
e.labels,
force.values = force.values, drop.na = drop.na, var.name = NULL)`: Pac
kage 'haven' required for this function. Please install it.
ml03. I use dplyr, purrr and stringr extensively so I import them as a whole. Should I still import functions from them (which will be many) individually?
ml04. I think it comes from this line in the tests:
library(vcr) # *Required* as vcr is set up on loading
which is mandatory for vcr to work.
-
ml02. Following your example with the haven package I saw you need to import
haven::read_xptbecause the sjlabelled package needs it. That surprises me. Usually each package must import any external function it needs, and not ask users to do it. Do you know why that's the case? Also I see haven is listed in .pre-commit.config.yaml -- which I'm not familiar with. So likely there is a good explanation and I just happen to never have encounter a case like this. It would be good to articulate an explanation because reviewers might be surprised too. -
ml03. Yeah, AFAIK best practice is to either namespace each function each time you call it or import each function individually. For example, each time use something like
dplyr::filter()or import it once withusethis::use_import_from("dplyr", "filter")then use it each time just likefilter(). -
ml04. I see. Thanks!
-
[ ] ml05. When tests run I see a lot of printed output. Please suppress it so that reviewers can see a succinct test report. If the output is not generated from an R condition (e.g. messages, warnings, or errors) it may be hard to suppress. See
capture.output()-- you may need to implement a way to capture the output and maybe implement aquietlyargument you can set toTRUEduring tests. -
[ ] ml06. The test results I see show many warnings. Please address them if you don't expect them or suppress them if you do expect them. If you expect them it's best to make them go away so that you don't develop the habit of ignoring them and risk missing an important one that you don't expect.
[ FAIL 0 | WARN 591 | SKIP 0 | PASS 4 ]
- [ ] ml07. Can you please make your project an RStudio project? Most R developers/contributors work in RStudio. Without an .Rproj file launching the project is hard, and I would like reviewers to enter your package as smoothly as possible. You may use
usethis::use_rstudio(). And later it may help to lower the entry-barrier for contributors.
ml02. I believe this is because in sjlabelled, haven is a package in the Suggets field. The function it calls from haven is not actually haven::read_xpt but I needed to import an arbitrary function from haven for the set_labels function to see and load it.
Please see the DESCRIPTION file for sjlabelled: https://github.com/strengejacke/sjlabelled/blob/master/DESCRIPTION.
Package: sjlabelled
Type: Package
Encoding: UTF-8
Title: Labelled Data Utility Functions
Version: 1.2.0.3
Authors@R: c(
person("Daniel", "Lüdecke", role = c("aut", "cre"), email = "[email protected]", comment = c(ORCID = "0000-0002-8895-3206")),
person("avid", "Ranzolin", role = "ctb", email = "[email protected]"),
person("Jonathan", "De Troye", role = "ctb", email = "[email protected]")
)
Maintainer: Daniel Lüdecke <[email protected]>
Description: Collection of functions dealing with labelled data, like reading and
writing data between R and other statistical software packages like 'SPSS',
'SAS' or 'Stata', and working with labelled data. This includes easy ways
to get, set or change value and variable label attributes, to convert
labelled vectors into factors or numeric (and vice versa), or to deal with
multiple declared missing values.
License: GPL-3
Depends:
R (>= 3.4)
Imports:
insight,
datawizard,
stats,
tools,
utils
Suggests:
dplyr,
haven (>= 1.1.2),
magrittr,
sjmisc,
sjPlot,
knitr,
rlang,
rmarkdown,
snakecase,
testthat
URL: https://strengejacke.github.io/sjlabelled/
BugReports: https://github.com/strengejacke/sjlabelled/issues
RoxygenNote: 7.2.1
VignetteBuilder: knitr
And the specific lines where haven is loaded: https://github.com/strengejacke/sjlabelled/blob/548fa397bd013ec7e44b225dd971d19628fdc866/R/set_labels.R#L317.
What would be the best way to deal with this?
ml05-7. I was able to capture the outputs when drafting the package so I should be able to do that in the tests. The warnings are not intended and are due to package versions. I will resolve these and create an RStudio project and then update this comment. Thank you so much!
ml02. Thanks for explaining. The best solution will likely vary for each of the "unused" packages.
In the case of heaven, the file you showed me has a single call of the type haven::<some function> so it might be worth looking at the source code of that function and see if you can re-implement it and remove the dependency on haven.
https://github.com/strengejacke/sjlabelled/blob/548fa397bd013ec7e44b225dd971d19628fdc866/R/set_labels.R#L325
More generally, I think a great explanation of the trade-offs in dependencies is that of Jim Hester in his talk "It depends": https://www.youtube.com/watch?v=mum13N7CGUI . So as long as you understand those trade-offs you would be able to make an informed decision for each "unused" package and justify your decision if the reviewers ask.
Dear @lyh970817, Just checking. Would you be available to address the comments ml05-ml07? We can also put this submission on hold if you need more time. Let me know.
Dear @lyh970817,
Just checking. Would you be available to address the comments ml05-ml07? We can also put this submission on hold if you need more time. Let me know.
Yes, sorry - would just need a couple more days to address these. Thanks.
@ropensci-review-bot put on hold
Submission on hold!
@maurolepore: Please review the holding status
@lyh970817, how would you like to proceed?
- Resume the submission.
- Continue on hold.
- Withdrawal the submission.
The holding status will be revisited every 3 months, and after one year the issue will be closed. -- https://devdevguide.netlify.app/softwarereview_policies.html#policiesreviewprocess
Dear @lyh970817
I hope all is well. I totally understand priorities change. At this moment I believe this policy applies:
If the author hasn’t requested a holding label, but is simply not responding, we should close the issue within one month after the last contact intent. This intent will include a comment tagging the author, but also an email using the email address listed in the DESCRIPTION of the package which is one of the rare cases where the editor will try to contact the author by email. -- https://devdevguide.netlify.app/softwarereview_policies
FYI my next step is to confirm with the chief editor and if they agree I'll close the issue and let you know by email.
Dear @lyh970817 I confirmed with the chief editor and shared my next steps with the entire editorial board. I'll go ahead and close this issue and let you know by email.
Once again, I understand priorities change. Thank a lot for contributing to rOpenSci. We look forward to more contributions whenever it's a good time.