qualtRics
qualtRics copied to clipboard
"Matrix Table - Likert - Allow one answer" Is missing ordinal information where available in like situations.
I have been making a spreadsheet based on an almost exhaustive comparison between the qualtRics data, column_map, item preview, and each data column's attributes in order to write a function that automatically standardizes the format of each item overall potential item types (I can provide this to whoever is interested).
My issue is that specifically in the case of "Matrix Table - Likert - Allow one answer" I can't find any order information in the available structures, whereas in similar items it is at least derivable.
Multiple-Choice Item
When mutually inclusive the items subdivide in the order item choice order AND provide a choiceId.
When mutually exclusive the item is collapsed into one column, and the order is attribute embedded.
Matrix Item
Like multiple-choice, when mutually inclusive the items subdivide in the order item choice order AND provide a choiceId.
Like multiple-choice, when mutually exclusive each statement row is collapsed into one column by each row of the matrix. However, not like multiple-choice, no order information attribute is embedded in the data.
My recommendation, if possible, would be to embed the order into the every data column pertaining to each statement (row) in the matrix item. Otherwise, you have to interpret the order (if there is any) after the fact.
I know I might be missing something by overlooking some attribute, but I couldn't find anythying. If I am, please point it out for me so I can add to my function's decision-making.
Thank You
I think you might be missing this, but this does get complicated so I might not be understanding what your question is.
Take a look at this very important survey about sourdough bread, and the matrix table + Likert + allow-one-answer question:
Are you aware of the sourdough bread features listed below?
Now let's get that survey via the qualtRics package:
library(qualtRics)
library(tidyverse)
library(rlang)
#>
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#>
#> %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
#> flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
#> splice
sourdough <- fetch_survey("SV_5BJRo2RGHajIlOB", add_column_map = TRUE)
#> | | | 0% | |======================================================================| 100%
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> .default = col_character(),
#> StartDate = col_datetime(format = ""),
#> EndDate = col_datetime(format = ""),
#> Progress = col_double(),
#> `Duration (in seconds)` = col_double(),
#> Finished = col_logical(),
#> RecordedDate = col_datetime(format = ""),
#> RecipientLastName = col_logical(),
#> RecipientFirstName = col_logical(),
#> RecipientEmail = col_logical(),
#> ExternalReference = col_logical(),
#> LocationLatitude = col_double(),
#> LocationLongitude = col_double(),
#> Q1007 = col_double(),
#> Q1_DO_1 = col_double(),
#> Q1_DO_2 = col_double(),
#> Q1_DO_3 = col_double(),
#> Q1_DO_4 = col_double(),
#> Q1_DO_5 = col_double(),
#> SolutionRevision = col_double(),
#> FL_6_DO_FL_7 = col_double()
#> # ... with 4 more columns
#> )
#> ℹ Use `spec()` for the full column specifications.
The order that each survey respondent saw these answers in is in these "display order" columns:
sourdough %>%
select(starts_with("Q1_DO"))
#> # A tibble: 122 x 5
#> Q1_DO_1 Q1_DO_2 Q1_DO_3 Q1_DO_4 Q1_DO_5
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 5 3 2 1
#> 2 2 1 5 4 3
#> 3 3 2 5 1 4
#> 4 3 2 1 5 4
#> 5 4 1 5 2 3
#> 6 1 3 4 2 5
#> 7 1 2 4 3 5
#> 8 5 3 1 4 2
#> 9 4 1 3 2 5
#> 10 4 1 2 3 5
#> # … with 112 more rows
You can see what these correspond to in the column_map
(stored as an attribute) in these rows:
sourdough %@% "column_map" %>%
filter(str_detect(qname, "^Q1_"))
#> # A tibble: 10 x 7
#> qname description main sub ImportId timeZone choiceId
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Q1_1 Are you aware of t… Are you awa… Distracti… QID1_1 <NA> <NA>
#> 2 Q1_2 Are you aware of t… Are you awa… Texture QID1_2 <NA> <NA>
#> 3 Q1_3 Are you aware of t… Are you awa… Flavor QID1_3 <NA> <NA>
#> 4 Q1_4 Are you aware of t… Are you awa… Color QID1_4 <NA> <NA>
#> 5 Q1_5 Are you aware of t… Are you awa… Nutrition… QID1_5 <NA> <NA>
#> 6 Q1_DO… Are you aware of t… Are you awa… Display O… QID1_DO <NA> 1
#> 7 Q1_DO… Are you aware of t… Are you awa… Display O… QID1_DO <NA> 2
#> 8 Q1_DO… Are you aware of t… Are you awa… Display O… QID1_DO <NA> 3
#> 9 Q1_DO… Are you aware of t… Are you awa… Display O… QID1_DO <NA> 4
#> 10 Q1_DO… Are you aware of t… Are you awa… Display O… QID1_DO <NA> 5
Created on 2021-07-06 by the reprex package (v2.0.0)
I believe all the info is there, and to clarify, we aren't doing much creating of these columns, but instead formatting and making available in R what the API makes available.
Thank you for replying, and for your help. I will use your example since you like bread so much. ;)
Hypothetically, if NO ONE ever selected one of the options in the matrix, like "Not aware of it", how could you tell using JUST sourdough and sourdough's column_map that "Not aware of it" is even an option, much less the second option?
From what I can tell, you can't -- not without using fetch_description. This issue was detected and addressed for early examples of mutually exclusive multiple-choice questions by making the relevant data columns an ordinal vector with levels for all possible values.
I have found that the following mutually exclusive item types don't have information about the complete set and order of selectable options:
- [Matrix Table, Likert, Allow one answer]
- [Matrix Table, Likert, Dropdown list] (Might be too much, but still absent)
- [Matrix Table, Likert, Drag and drop]
- [Side by Side]
- [Drill Down] (Might be too much, but still absent)
Ah OK, thanks for these details; I understand better what you are asking now. I just stepped through how we build the column mapping from the included metadata and that information is not available. It is not included with the survey results as far as I can tell. I'd recommend using fetch_description()
for that richer information. (Or if someone is able to extract it from what is returned from this API endpoint, that would be most welcome!)
Yeah, exactly. Right now I am working on isolating the exact bits necessary within the fetch_description()
so that they can be automatically embedded within a fetch_survey()
wrapper function when relevant. It is obviously a hack.
It is not included with the survey results as far as I can tell.
So typically the API returns data with ordinal fields when necessary, but isn't doing it for Matrix Tables? It would slow down fetch_survey()
if it had to do a separate extra fetch_description()
in order to add the missing information. If no one else is able to address this perhaps I could help. It would take some time for me to get familiar with the package.
@ProfFancyPants perhaps we should talk. The past ~2 years I've been working on a project requiring me to label, match up, and combine 100+ similar-but-not-identical qualtrics surveys, and to do that I ended up building out a system that includes processing the sorts of things you're describing here. I'd love to compare notes with you.
You're absolutely correct that you have to mine the results of fetch_description()
for this. On top of that, HOW you do that varies depending on the situation. In Qualtrics' internal representations of surveys there are notions of "questions", "choices", & "answers", which themselves are represented in odd ways.
(You may already know all this, but I'm just recording it here to further the discussion for the package).
So, using your matrix question example specifically: the "question" would be the whole thing, while "choices" would be each of "Nutritional value" through "Distraction". The "answers" would be "Aware" and "Not".
HOWEVER, it's entirely possible the choice ID's could be 1,3,5,7, & 18. AND not displayed in that order: "Nutritional value" could be choice 7, "color" could be 18, "texture" could be 1, and so forth. The same can be true independently for the answers. Plus you can rename choice & answer IDs to other, non-numeric things.
Meanwhile, recoded values can again vary independently of everything else.
Also, this is different in different situations. In the case of a standard multiple choice question, each alternative response option is a "choice," and there is no notion of an "answer."
All of this info can be correctly matched to variables in a response download. But neither the column map nor the results of fetch_description()
are sufficient on their own. You have to use them both together, and you have to apply different algorithms to each depending on what type of question each variable in your response download is linking back to.
@jmobrien
to label, match up, and combine 100+ similar-but-not-identical qualtrics surveys…
This is exactly what I have been doing. I have had some success, but I can see the many roadblocks in qualtrics that would make this feel like a Sisyphean task.
HOW you do that varies depending on the situation.
Exactly. So I have developed a few generalized R tools that recursively deconstruct extremely diverse nested list structures with various class objects embedded. Think reshape2::melt.list
but on steroids. I also have developed in situ merge tools that maintain the order of these kinds of kitchen sink nested list renderings. Using these two tools, I was able to get a much better sense of the similarities and differences regarding how each qualtrics question type and settings change the fetch_description()
metadata. As you have no doubt noticed, there are inconsistencies. You don't always get what you expect, and there is one question type that actually breaks all standards and is reordered (throwing impossible perfect situ merge warnings for my tool).
(You may already know all this, but I'm just recording it here to further the discussion for the package).
O yes, exactly what you are describing is very clear to me now after running my prototype survey through the wringer.
alternative response option is a "choice," and there is no notion of an "answer."
So I was able to find some success here (at least with my target correction), and it is obvious why this is such an issue. It is because of how it is specifically nested being inconsistent. But I found it to be consistently inconsistent in a way that I was able to utilize -- at least for this current version of the API dump.
All of this info can be correctly matched to variables in a response download. But neither the column map nor the results of fetch_description() are sufficient on their own.
Yeah, that is how I correct my issues.
Now I haven't had an opportunity to investigate the raw API data. I haven't investigated exactly how/when/where you get the bits and pieces of the necessary information for the final product. I have noticed a number of bugs related to whether one uses the default delimiter (" - ") in their questions, and how it will cut off columns in the column map. Whoever, overall I think the qualtRics package is a success, and you guys did a great job with the situation at hand.
What I did was a data detective cat herding. I don't think my processes could be implemented directly due to inefficiencies (although they are still quite fast), but I do believe I might be able to bring some intel to the situation.
We should probably talk yeah. Emails in my bio.
Thanks @ProfFancyPants. Your approach sounds meaningfully similar to my own--tons of reshaping to generate more easily reference-able descriptions that can pull out what I need with the appropriate situational rubrics. "Cat herding" is a good way to describe my experience, too, especially as what I did largely came before Qualtrics had published the endpoint return schemas for their API docs, so I was having to work forensically. So, yes, let's speak. I'll reach out soon.
As far as the raw API content you're not missing anything. fetch_description()
(which was my work--again one of the things I wrote for all this) does next to no processing, mostly turning the nested JSON list into an R list using jsonlite
. I did reshape a few small things for user convenience, but they were already isolated objects in the JSON.
@juliasilge the discussion is drifting away from qualtRics
so we'll move it out of here. But I brought it up in part since I (and @ProfFancyPants too it seems, though I won't speak for him) now find myself with quite a lot of additional functionality coded up that I'm not sure what to do with. If you're interested, I'd be glad to discuss whether/how any of it works within your goals for the qualtRics
package.
My feeling here is that if there are generalizable functions that can be written to post-process data for specific questions types coming out of the Qualtrics API in a consistent way (given, say, the metadata and description we already have), that would be a good fit. If we are talking about functions that depend a lot on your own specific data (and you two folks happen to have similar data) then maybe it would not be as good a fit for this package.
In my case, I think a lot of what I needed to do has general-purpose utility. For example:
- matching specific items in the column map to [sub-parts of] question-level records provided by fetch_description(),
- building more user(me)-accessible dataframes of response options so I could see response information like response text, response ordering, recodings, etc. (This was used to catch/fix things like bad recodings & inconsistencies across surveys, as well as give me a source of metadata to attach to actual responses)
- Unpacking the recursively nested structure of the Flow list element into a dataframe so you could make sense of the survey flow, get at survey logic and/or random displays, etc.
- Do the same uppacking with the Blocks element (this was far simpler than Flow)
- Doing some combining across all the above resources to have something you can query for any given response variable
Those are the core pieces at least. IMO, I didn't feel like I could get much utility out of fetch_description() until I had that hammered out, either. I haven't yet been able to follow up w/ @ProfFancyPants, but from our convo above I got the sense he felt similarly and so might have built similar tools. And generally, that necessity is what gives me the sense these could be something good for the broader qualtRics
userbase.
Two caveats on that, though:
- Most all work was before Qualtrics published the return schema for the fetch_description() endpoint. Thus, I had to work things out rather than build from a known structure.
- Having no schema also means functionality IS limited to our specific data, inasmuch as solutions were only built for question types we ourselves used. We did use many of the most common types like multiple choice, matrix, dropdown, text entry, etc., but not everything (loop & merge comes to mind as a big one). Also we probably didn't use all configurations of the types we did employ.
So, as with our recent changes to the URL creation scheme, I do wonder about the cost/benefit of building out & cleaning up what I have vs. starting some big things over with the actual return schema in hand. But, schema or no schema, just the big-picture work of working out how the super-complex list from fetch_description()
could become something more user-accessible was a pretty massive challenge--one that apparently at least two people so far had to spend a meaningful chunk of their lives on. So I do I think there's something major here that would be great to pass to qualtRics
users, even if I'm not clear what exactly that is.
Sorry, I'm jumping into this late, but I had a similar question. I'm pulling in data frame a matrix table using dat <- fetch_survey(<SURVEY_ID>)
and every matrix table item reads in as a character instead of an ordered factor, which it should be (and which I see on a simple multiple choice question).
Is there a way to read in matrix table questions with the appropriate factor levels? Or is there a suggestion how to programmatically update this meta after it is downloaded? Basically anything that will keep me from having to do dat$Q1_1 <- factor(dat$Q1_1, c("Manually", "Writing", "Down", "Factor", "Levels"))
If not, I'd be glad to write something—it seems like everything comes down from fetch_description()
, it's just not applying it to matrix table variables. Although, using sapply(desc$questions$QID1$Choices, getElement, "Display")
, where desc
is returned by fetch_description(<SURVEY_ID>)
, it appears as though those aren't the same as the values in the column itself—it doesn't clean the HTML...
In the short term, is there a way to pull down matrix table questions as factors instead of character vectors? Right now, the most painless solution I can think of is manual download to SPSS file and read-in using haven
.
Thanks
@markhwhiteii What you need is indeed buried fetch_description()
. Some of us have made solutions for ourselves for getting out what we need from there, but no one's had the time & focus yet to incorporate things, or even decide on a design approach.
A big reason is the lack of consistent general solutions--for instance, since you're using a matrix table, the response labels will be under "Answers" rather than under "Choices". If you're working on something for yourself, I'd mine there.
(Or, if you want an existing solutions and don't mind it being a bit rough, feel free to reach out to me via external channels)
A solution here would represent a significant time savings, so I think it's worth pursuing. I can think of two solutions, though each of them have their own downsides.
Solution 1: Grapple with the Metadata
As has been pointed out earlier in this thread, the way that Qualtrics structures metadata about likert questions is confusing and annoying.
"MC" question metadata top-level
questionType
questionText
questionLabel
validation
questionName # Column name *
choices # Factor levels
* unfortunately, Qualtrics allows for duplicate question names, so this is more correctly "default column name, barring duplicate question names"
"Matrix", "Likert", "SingleAnswer" metadata top-level
questionType
questionText
questionLabel
validation
questionName # Column name stem
subQuestions # Scale items
choices # Factor levels for each of the subquestions
A sketchy fix
The most tenuous piece of this solution is finding the columns associated with the single answer matrix table question. Frustratingly, the column names are not represented in the metadata as far as I can tell, so I'm using matches()
to apply the levels to any column whose name contains "{question_name}_"
. Because the specific column names aren't available, testing for their presence in the data as a safeguard against incorrect application of the levels, as qualtRics
currently does for "MC" questions, isn't possible.
survey_responses <- qualtRics::fetch_survey(survey_id, force_request = TRUE)
survey_md <- metadata(survey_id)
md <- tibble::enframe(metadata(survey_id, get = "questions")[[1]])
md_parsed <- md |>
dplyr::mutate(
qtype = map(value, "questionType"),
qname = map_chr(value, "questionName"),
qtype_type = map_chr(qtype, "type"),
qtype_selector = map_chr(qtype, "selector"),
# Pull out the subSelector feature for Matrix questions
qtype_subselector = map_chr(
qtype,
function(x) {
if(x[["type"]] == "Matrix") {
x[["subSelector"]]
} else {
NA_character_
}
}
),
type_supp = qtype_type %in% c("MC"),
selector_supp = qtype_selector %in% c("SAVR"),
supported = type_supp & selector_supp
)
md_single_answer_likerts <- md_parsed |>
dplyr::filter(
qtype_type %in% c("Matrix"),
qtype_selector %in% c("Likert"),
qtype_subselector %in% c("SingleAnswer")
) |>
dplyr::mutate(
colname_pattern = glue::glue("{qname}_"),
levels = value |>
map(
function(x) {
x$choices |> map_chr("choiceText")
}
)
)
for(i in seq_len(nrow(md_single_answer_likerts))) {
survey_responses <- survey_responses |>
mutate(
across(
c(matches(md_single_answer_likerts[["colname_pattern"]][i]), -matches("DO")),
function(x) {
readr::parse_factor(
x,
levels = md_single_answer_likerts[["levels"]][[i]],
ordered = TRUE
)
}
)
)
}
Solution 2: A Safe haven
?
To the best of my knowledge, the API still offers response exports in SPSS' .sav format. That format embeds metadata for ordinal columns that plays nicely with sjlabelled
. In my experience, it adds some time to the export on the back end, as I think prepping the data for export is more burdensome. Otherwise, I've found it to be a fairly clean solution.
The fetch_survey()
function could accept a format parameter that modifies the post query and the import behaviour. It feels a bit uncomfy because the different treatment of factors between the default csv and sav formats wouldn't be intuitive to the user.
Frustratingly, the column names are not represented in the metadata as far as I can tell
My engagement with this is a bit out of date, but I do think it's possible to match on the internal qualtrics ID (sometimes QID, sometimes ImportId, depending on location), which is unique by question. I believe that's stored in the column maps by default. That was my starting place for matching w/r/t other properties.
Question about the alternative data format idea--while something like SPSS format would include those, I'm suspicious it might just label the actual responses rather than set up labels for all responses. I'm not sure, but have you checked on that?
If just the responses, the issue would be that, say, if you had a 1-5 labelled scale where nobody picked 1, you wouldn't be able to get one response option out of the response data. That would limit its usefulness for related documentation tasks, e.g. codebooking.
You'd have to fill in the gaps with the survey description metadata, which puts you back where you started.
Question about the alternative data format idea--while something like SPSS format would include those, I'm suspicious it might just label the actual responses rather than set up labels for all responses. I'm not sure, but have you checked on that?
The spss format preserves unused levels.
Interested in the column map idea though. I'll check if there's something in there that can make the subQuestion to colname link a bit more robust.
Qualtrics' puzzling decision-making shows up here again.
Consider a hypothetical example based on the sourdough survey:
- Where the Matrix question's name is
featureAwareness
- And the first item's "Question Export Tag" (visible under the recode values menu) is
nutritionalValue
In this example, the column containing the response would be named nutritionalValue
, while the qname field associated with it, returned by qualtRics::column_map()
, would be featureAwareness_nutritionalValue
. I believe Qualtrics changed the default column name generation behaviour of its export roughly 2 years ago, but did not propagate that change to the column map generation.
I think this is because the column maps don't come from qualtrics proper--they are built from parsing and restructuring JSON metadata that comes with the response data download (in a manual download, you can see it as the first 2 rows beneath the header row).
The default behavior is to stick main name + choice name together. I think that can match what you get in a lot of default cases from Qualtrics. But, I think what you actually get for a question name depends both on question-level options--which, again, you won't have access to w/o the metadata.
That said, I'm the one who built the current column mapping functionality. And I built it fairly quickly around my own use cases, which didn't often use choice-level names, meaning i may have left room for better approaches. A link to the relevant code is below, if you want to examine/tinker with/redo it:
https://github.com/ropensci/qualtRics/blob/e350dd3e4d358e4b9b3f248ab35f55705dd31ec7/R/read_survey.R#L143-L223
Good to know about SPSS.
There's also JSON and XML options, though I haven't examined them closely enough to know what their cost/benefits are vs. csv's and their metadata rows.
One last thing--does anyone know how to scrape & reshape/flatten the (deeply nested) HTML that outlines the response schema from getSurvey? If we had that in a more strucutred form, I think ideas about more useful and user-friendly survey metadata would be easier to weigh effectively.
A quick clarification, qualtRics::column_map()
makes a direct request to the metadata api. I don't think the name discrepancies are coming from read_survey.R
,
I did take a look at the JSON in that row of a test export, and found an internal Qualtrics tag (e.g. "importId: QID5_3") that could be useful, but would demand some additional parsing to make it so.
Right--that function queries the older V2 endpoint, which has a number of differences that will make it not match up with current response downloads. IMO that function should probably be deprecated, or at least have some warnings added.