software-review qualtdict: Generating Variable Dictionaries and Labelled Data Exports of Qualtrics Surveys

Submitting Author Name: Yuhao Lin Submitting Author Github Handle: @lyh970817 Repository: https://github.com/lyh970817/qualtdict Version submitted: 0.0.0.9000 Submission type: Standard Editor: @maurolepore Reviewers: TBD

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: qualtdict
Title: Generating Variable Dictionaries and Labelled Data Exports of Qualtrics
    Surveys
Version: 0.0.0.9000
Authors@R:
    person("Yuhao", "Lin", , "[email protected]", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0001-6357-5731"))
Description: Provides functions that generate variable dictionaries from
    'Qualtrics' <https://www.qualtrics.com/about/> surveys and labelled
    survey data based on the dictionary. This package is built upon the R
    package 'qualtRics' <https://github.com/ropensci/qualtRics/> which
    provides access to 'Qualtrics' survey data and metadata via the 'Qualtrics' API
    <https://api.qualtrics.com/>.
License: MIT + file LICENSE
URL: https://github.com/lyh970817/qualtdict
BugReports: https://github.com/lyh970817/qualtdict/issues
Imports:
    crul,
    dplyr,
    glue,
    haven,
    magrittr,
    openNLP,
    purrr,
    qualtRics,
    rlang,
    sjlabelled,
    slowraker,
    SnowballC,
    stringi,
    stringr,
    tibble,
    tidyr,
    xml2
Suggests:
    covr,
    knitr,
    rmarkdown,
    testthat (>= 3.0.0),
    vcr (>= 0.6.0)
VignetteBuilder: 
    knitr
Config/testthat/edition: 3
Config/testthat/start-first: dict_generate, dict_validate, get_survey_data
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [ ] data retrieval
- [ ] data extraction
- [x] data munging
- [ ] data deposition
- [ ] data validation and testing
- [ ] workflow automation
- [ ] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Qualtrics is an online survey and data collection software platform. While the qualtRics R package implements data retrieval from the Qualtrics platform, this package 'qualtdict' further processes its output to generate variable dictionaries and labelled data designed to be used for data analyses directly.

Who is the target audience and what are scientific applications of this package?

The target audience is those who use the Qualtrics survey platform to collect data. This package generates variable dictionaries and labelled data designed to be used for data analyses directly.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

No, but there is the similar qualtRics R package that retrieves a broader range of data from Qualtrics than this package utilises. The output formats from qualtRics are much less user-friendly, for example, it retrieves survey metadata in a nested-list, json-like format, while this package rearranges essential parts of this metadata (retrieved using quatRics) into a publishable variable dictionary in a table format that can be visually inspected in, for example, excel.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

Yes.

If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any pkgcheck items which your package is unable to pass.

Technical checks

Confirm each of the following by checking the box.

[x] I have read the rOpenSci packaging guide.
[x] I have read the author guide and I expect to maintain this package for at least 2 years or to find a replacement.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

[ ] The package is novel and will be of interest to the broad readership of the journal.
[ ] The manuscript describing the package is no longer than 3000 words.
[ ] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Feb 02 '23 15:02 lyh970817

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

Feb 02 '23 15:02 ropensci-review-bot

:rocket:

Editor check started

:wave:

Feb 02 '23 15:02 ropensci-review-bot

Checks for qualtdict (v0.0.0.9000)

git hash: d31c0887

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 86%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: MIT + file LICENSE

1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type	package	ncalls
internal	base	179
internal	qualtdict	118
internal	utils	5
internal	stats	1
imports	magrittr	70
imports	rlang	8
imports	glue	7
imports	qualtRics	3
imports	tibble	3
imports	openNLP	2
imports	sjlabelled	2
imports	xml2	2
imports	stringi	1
imports	tidyr	1
imports	crul	NA
imports	dplyr	NA
imports	haven	NA
imports	purrr	NA
imports	slowraker	NA
imports	SnowballC	NA
imports	stringr	NA
suggests	covr	NA
suggests	knitr	NA
suggests	rmarkdown	NA
suggests	testthat	NA
suggests	vcr	NA
linking_to	NA	NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

list (66), length (9), names (7), c (6), unique (6), unlist (6), args (4), ifelse (4), is.null (4), max (4), min (4), paste0 (4), all (3), is.na (3), rownames (3), as.matrix (2), colnames (2), factor (2), for (2), grep (2), is.character (2), levels (2), seq_along (2), split (2), structure (2), table (2), vapply (2), which (2), any (1), as.logical (1), character (1), class (1), data.frame (1), do.call (1), if (1), is.function (1), is.logical (1), labels (1), lapply (1), mode (1), numeric (1), q (1), readRDS (1), return (1), sum (1), suppressWarnings (1), tempdir (1), vector (1)

qualtdict

item_or_level_qid (10), rep_level_qid (10), suf_level_qid (9), null_na (7), not_applicable_qid (6), questiontext_qid (6), suf_item_rep_level_qid (6), suf_item_suf_level_qid (6), collapse (5), file_upload_qid (5), rep_level (3), retry (3), calc_keyword_scores (2), check_item (2), check_json (2), check_names (2), easyname_gen (2), label_to_sfx (2), paste_narm (2), qid_recode (2), recode_json (2), rep_item (2), sbs_qid (2), suf_level_suf_item_qid (2), suf_text_qid (2), timing_qid (2), add_text (1), add_text_mc (1), checkarg_isfunction (1), checkarg_isname (1), checkarg_isqualtdict (1), convert_html (1), dict_generate (1), dict_validate (1), get_survey_data (1), is_onetoone (1), order_name (1), suf_nmlabel_qid (1), text (1), which_not_onetoone (1)

magrittr

%>% (70)

rlang

abort (7), hash (1)

glue

glue (7)

utils

txtProgressBar (4), getFromNamespace (1)

qualtRics

fetch_description (1), fetch_survey (1), metadata (1)

tibble

tibble (2), enframe (1)

openNLP

Maxent_POS_Tag_Annotator (1), Maxent_Word_Token_Annotator (1)

sjlabelled

set_label (1), set_labels (1)

xml2

read_html (1), xml_text (1)

stats

setNames (1)

stringi

stri_count_words (1)

tidyr

unite (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

code in R (100% in 10 files) and
1 authors
1 vignette
no internal data file
17 imported packages
3 exported functions (median 25 lines of code)
110 non-exported functions in R (median 10 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used:

loc = "Lines of Code"
fn = "function"
exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure	value	percentile	noteworthy
files_R	10	59.0
files_vignettes	1	68.4
files_tests	7	86.4
loc_R	1152	71.7
loc_vignettes	118	30.8
loc_tests	1014	87.2
num_vignettes	1	64.8
n_fns_r	113	79.3
n_fns_r_exported	3	12.9
n_fns_r_not_exported	110	85.5
n_fns_per_file_r	6	75.4
num_params_per_fn	5	69.6
loc_per_fn_r	11	32.3
loc_per_fn_r_exp	25	55.9
loc_per_fn_r_not_exp	10	31.3
rel_whitespace_R	17	70.0
rel_whitespace_vignettes	25	21.4
rel_whitespace_tests	1	14.7
doclines_per_fn_exp	43	54.1
doclines_per_fn_not_exp	0	0.0	TRUE
fn_call_network_size	57	69.0

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

3. `goodpractice` and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

GitHub Workflow Results

id	name	conclusion	sha	run_number	date
4076045888	R-CMD-check	success	d31c08	11	2023-02-02
4076045893	test-coverage	success	d31c08	11	2023-02-02

3b. `goodpractice` results

`R CMD check` with rcmdcheck

R CMD check generated the following check_fail:

no_import_package_as_a_whole

Test coverage with covr

Package coverage: 85.98

Cyclocomplexity with cyclocomp

No functions have cyclocomplexity >= 15

Static code analyses with lintr

lintr found the following 1 potential issues:

message	number of times
Avoid library() and require() calls in packages	1

Package Versions

package	version
pkgstats	0.1.3
pkgcheck	0.1.1.11

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

Feb 02 '23 15:02 ropensci-review-bot

Dear @lyh970817, FYI I'm still searching for a handling editor. It shouldn't take much longer. Thanks for your patience.

Feb 07 '23 19:02 maurolepore

Dear @lyh970817, FYI I'm still searching for a handling editor. It shouldn't take much longer. Thanks for your patience.

Thank you so much!

Feb 09 '23 07:02 lyh970817

@ropensci-review-bot assign @maurolepore as editor

Feb 11 '23 10:02 maurolepore

Assigned! @maurolepore is now the editor

Feb 11 '23 11:02 ropensci-review-bot

Dear @lyh970817 I'm delighted to announce that I'll be the handling editor of this submission.

Semantic tags for my comments

To help you track my comments I tagged them with "ml" and numbered sequentially: ml01, ml02, and so on. Comments following bullets are for you to consider -- you may or may not respond to them. Comments following check-boxes are requests for some action -- please respond.

Reviewers

[x] ml01. Can you please suggest three reviewers? Following our guidelines I'll use one at most, but I would like your view of the types of expertise needed to review qualtdict.

Checks

Here I list a few things that caught my attention. They are not blockers but the sooner we address them the better.

Package Dependencies

ml02. Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.

goodpractice and other checks

ml03. R CMD check generated the following check_fail: no_import_package_as_a_whole
ml04. Avoid library() and require() calls in packages: 1 time

Feb 11 '23 11:02 maurolepore

Thank you so much for taking time to review this. These are my responses.

ml01. Unfortunately I'm not sure if I could name any specific authors. But expertise-wise I thought having someone with a psychology/social science background might be helpful. As qualtdict is centred around creating a variable dictionary giving an intuitive overview of survey data for analysts. The usefulness of such a dictionary is probably best judged by someone who analyses such data on a daily basis (in contrast to a data engineer who implements APIs for such data).

ml02. R CMD Check seems to fail without importing some of the packages that I don't actually use. For instance, without importing haven:

Error in `set_labels_helper(x = .dat, labels = labels, force.labels = forc
e.labels, 
    force.values = force.values, drop.na = drop.na, var.name = NULL)`: Pac
kage 'haven' required for this function. Please install it.

ml03. I use dplyr, purrr and stringr extensively so I import them as a whole. Should I still import functions from them (which will be many) individually?

ml04. I think it comes from this line in the tests:

library(vcr) # *Required* as vcr is set up on loading

which is mandatory for vcr to work.

Feb 11 '23 13:02 lyh970817

ml02. Following your example with the haven package I saw you need to import haven::read_xpt because the sjlabelled package needs it. That surprises me. Usually each package must import any external function it needs, and not ask users to do it. Do you know why that's the case? Also I see haven is listed in .pre-commit.config.yaml -- which I'm not familiar with. So likely there is a good explanation and I just happen to never have encounter a case like this. It would be good to articulate an explanation because reviewers might be surprised too.
ml03. Yeah, AFAIK best practice is to either namespace each function each time you call it or import each function individually. For example, each time use something like dplyr::filter() or import it once with usethis::use_import_from("dplyr", "filter") then use it each time just like filter().
ml04. I see. Thanks!

[ ] ml05. When tests run I see a lot of printed output. Please suppress it so that reviewers can see a succinct test report. If the output is not generated from an R condition (e.g. messages, warnings, or errors) it may be hard to suppress. See capture.output() -- you may need to implement a way to capture the output and maybe implement a quietly argument you can set to TRUE during tests.
[ ] ml06. The test results I see show many warnings. Please address them if you don't expect them or suppress them if you do expect them. If you expect them it's best to make them go away so that you don't develop the habit of ignoring them and risk missing an important one that you don't expect.

[ FAIL 0 | WARN 591 | SKIP 0 | PASS 4 ]

[ ] ml07. Can you please make your project an RStudio project? Most R developers/contributors work in RStudio. Without an .Rproj file launching the project is hard, and I would like reviewers to enter your package as smoothly as possible. You may use usethis::use_rstudio(). And later it may help to lower the entry-barrier for contributors.

Feb 12 '23 21:02 maurolepore

ml02. I believe this is because in sjlabelled, haven is a package in the Suggets field. The function it calls from haven is not actually haven::read_xpt but I needed to import an arbitrary function from haven for the set_labels function to see and load it.

Please see the DESCRIPTION file for sjlabelled: https://github.com/strengejacke/sjlabelled/blob/master/DESCRIPTION.

Package: sjlabelled
Type: Package
Encoding: UTF-8
Title: Labelled Data Utility Functions
Version: 1.2.0.3
Authors@R: c(
    person("Daniel", "Lüdecke", role = c("aut", "cre"), email = "[email protected]", comment = c(ORCID = "0000-0002-8895-3206")),
    person("avid", "Ranzolin", role = "ctb", email = "[email protected]"),
    person("Jonathan", "De Troye", role = "ctb", email = "[email protected]")
    )
Maintainer: Daniel Lüdecke <[email protected]>
Description: Collection of functions dealing with labelled data, like reading and 
    writing data between R and other statistical software packages like 'SPSS',
    'SAS' or 'Stata', and working with labelled data. This includes easy ways 
    to get, set or change value and variable label attributes, to convert 
    labelled vectors into factors or numeric (and vice versa), or to deal with 
    multiple declared missing values.
License: GPL-3
Depends:
    R (>= 3.4)
Imports:
    insight,
    datawizard,
    stats,
    tools,
    utils
Suggests:
    dplyr,
    haven (>= 1.1.2),
    magrittr,
    sjmisc,
    sjPlot,
    knitr,
    rlang,
    rmarkdown,
    snakecase,
    testthat
URL: https://strengejacke.github.io/sjlabelled/
BugReports: https://github.com/strengejacke/sjlabelled/issues
RoxygenNote: 7.2.1
VignetteBuilder: knitr

And the specific lines where haven is loaded: https://github.com/strengejacke/sjlabelled/blob/548fa397bd013ec7e44b225dd971d19628fdc866/R/set_labels.R#L317.

What would be the best way to deal with this?

ml05-7. I was able to capture the outputs when drafting the package so I should be able to do that in the tests. The warnings are not intended and are due to package versions. I will resolve these and create an RStudio project and then update this comment. Thank you so much!

Feb 13 '23 09:02 lyh970817

ml02. Thanks for explaining. The best solution will likely vary for each of the "unused" packages.

In the case of heaven, the file you showed me has a single call of the type haven::<some function> so it might be worth looking at the source code of that function and see if you can re-implement it and remove the dependency on haven.

https://github.com/strengejacke/sjlabelled/blob/548fa397bd013ec7e44b225dd971d19628fdc866/R/set_labels.R#L325

More generally, I think a great explanation of the trade-offs in dependencies is that of Jim Hester in his talk "It depends": https://www.youtube.com/watch?v=mum13N7CGUI . So as long as you understand those trade-offs you would be able to make an informed decision for each "unused" package and justify your decision if the reviewers ask.

Feb 13 '23 11:02 maurolepore

Dear @lyh970817, Just checking. Would you be available to address the comments ml05-ml07? We can also put this submission on hold if you need more time. Let me know.

Apr 30 '23 17:04 maurolepore

Dear @lyh970817,

Just checking. Would you be available to address the comments ml05-ml07? We can also put this submission on hold if you need more time. Let me know.

Yes, sorry - would just need a couple more days to address these. Thanks.

May 06 '23 06:05 lyh970817

@ropensci-review-bot put on hold

Jan 07 '24 19:01 maurolepore

Submission on hold!

Jan 07 '24 19:01 ropensci-review-bot

@maurolepore: Please review the holding status

Apr 06 '24 19:04 ropensci-review-bot

@lyh970817, how would you like to proceed?

Resume the submission.
Continue on hold.
Withdrawal the submission.

The holding status will be revisited every 3 months, and after one year the issue will be closed. -- https://devdevguide.netlify.app/softwarereview_policies.html#policiesreviewprocess

Apr 08 '24 22:04 maurolepore

Dear @lyh970817

I hope all is well. I totally understand priorities change. At this moment I believe this policy applies:

If the author hasn’t requested a holding label, but is simply not responding, we should close the issue within one month after the last contact intent. This intent will include a comment tagging the author, but also an email using the email address listed in the DESCRIPTION of the package which is one of the rare cases where the editor will try to contact the author by email. -- https://devdevguide.netlify.app/softwarereview_policies

FYI my next step is to confirm with the chief editor and if they agree I'll close the issue and let you know by email.

Jun 19 '24 14:06 maurolepore

Dear @lyh970817 I confirmed with the chief editor and shared my next steps with the entire editorial board. I'll go ahead and close this issue and let you know by email.

Once again, I understand priorities change. Thank a lot for contributing to rOpenSci. We look forward to more contributions whenever it's a good time.

Jun 20 '24 15:06 maurolepore

software-review software-review copied to clipboard

qualtdict: Generating Variable Dictionaries and Labelled Data Exports of Qualtrics Surveys

Archive: TBD Version accepted: TBD Language: en

Scope

Technical checks

Publication options

Code of conduct

Checks for qualtdict (v0.0.0.9000)

1. Package Dependencies

2. Statistical Properties

2a. Network visualisation

3. goodpractice and other checks

3a. Continuous Integration Badges

3b. goodpractice results

R CMD check with rcmdcheck

Test coverage with covr

Cyclocomplexity with cyclocomp

Static code analyses with lintr

Editor-in-Chief Instructions:

Semantic tags for my comments

Reviewers

Checks

software-review
software-review copied to clipboard

3. `goodpractice` and other checks

3b. `goodpractice` results

`R CMD check` with rcmdcheck