[Bug]: Range, Median, IQR, and boxplot are not displayed for ordinal variables
JASP Version
0.95.2
Commit ID
No response
JASP Module
Descriptives
What analysis are you seeing the problem on?
Descriptives table and Boxplot
What OS are you seeing the problem on?
Linux, Flatpak
Bug Description
Problem
- Only valid, missing, mode, min, max, and quartiles are shown for ordinal variables
- Boxplots are also not returned
- Skewness and kurtosis are not returned, which might be correct because many formulas rely on computing the mean, which is formally not appropriate for ordinal variables. Still, I reckon there are suitable formulas for ordinal variables that do not rely on the mean.
Why this is a Problem
- I believe this behavior was purposely coded in the 363 pull request.
- However, I do not think this is a desirable behavior given:
- It runs contrary to the common assertion in textbooks that Range, median, and IQR are appropriate descriptives for ordinal variables. And that boxplots are suitable to graphically represent them.
- It is logically inconsistent with the fact that JASP still shows the quartiles. The median is Q2 so JASP already is computing the median, but it isn't showing it as such, which will confuse novice users. Moreover, JASP computes Q1/25% percentile, Q3/75%, Min, and Max, so it has everything needed to calculate IQR (i.e., Q3 - Q2) and Range (i.e., Max - Min). Moreover, the boxplot represents Q1, Q2, and Q3, so if JASP computes them numerically, it should also not object to representing them graphically.
Proposed Solution
I took a quick look at the code, and I think the solution should be relatively simple.
Whenever there is a test in the control flow regarding whether the variable is scale or ordinal, range, median, and IQR should be moved from the if (columnType == "scale") condition to the if (columnType == "scale" || columnType == "ordinal") condition.
I reckon something similar can be done regarding the boxplots.
if (columnType == "scale") {
resultsCol[["Median"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$median, na.omitted, median)
resultsCol[["Mean"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$mean, na.omitted, mean)
resultsCol[["Std. Error of Mean"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$seMean, na.omitted, function(param) { sd(param)/sqrt(length(param))} )
resultsCol[["Std. Deviation"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$sd, na.omitted, sd)
resultsCol[["Coefficient of Variation"]]<- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$coefficientOfVariation, na.omitted, function(param) { sd(param) / mean(param)})
resultsCol[["MAD"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$mad, na.omitted, function(param) { mad(param, constant = 1) } )
resultsCol[["MAD Robust"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$madRobust, na.omitted, mad)
resultsCol[["IQR"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$iqr, na.omitted, .descriptivesIqr)
resultsCol[["Variance"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$variance, na.omitted, var)
resultsCol[["Kurtosis"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$kurtosis, na.omitted, .descriptivesKurtosis)
resultsCol[["Std. Error of Kurtosis"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$kurtosis, na.omitted, .descriptivesSEK)
resultsCol[["Skewness"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$skewness, na.omitted, .descriptivesSkewness)
resultsCol[["Std. Error of Skewness"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$skewness, na.omitted, .descriptivesSES)
resultsCol[["Shapiro-Wilk"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$shapiroWilkTest, na.omitted, function(param) { res <- try(shapiro.test(param)$statistic); if(isTryError(res)) NaN else res })
resultsCol[["P-value of Shapiro-Wilk"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$shapiroWilkTest, na.omitted, function(param) { res <- try(shapiro.test(param)$p.value); if(isTryError(res)) NaN else res })
resultsCol[["Sum"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$sum, na.omitted, sum)
}
if (columnType == "scale" || columnType == "ordinal") {
resultsCol[["Range"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$range, na.omitted, function(param) { range(param)[2] - range(param)[1]})
resultsCol[["Minimum"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$minimum, na.omitted, min)
resultsCol[["Maximum"]] <- .descriptivesDescriptivesTable_subFunction_OptionChecker(options$maximum, na.omitted, max)
}
P.S
Reconsider Adopting Semantic Versioning
As I commented on in #2897, JASP is an amazing piece of software, serving as the backbone to statistical courses in multiple institutions (mine included).
Keeping the software in a zero-based versioning scheme communicates a lack of maturity which is not the case.
More importantly, it makes it harder for you to signal to users when there are breaking changes.
For instance, though I appreciate and agree with your decision to stop showing Means, and SDs for ordinal variables, I can see it breaking some workflows with little notice for users (despite some references to it in the changelog).
It also means that courses that rely on JASP (such as two that I contribute to) have a hard time figuring out whether a new version of JASP is likely to affect the tests featured in the curricula. My team, always takes a look at the changelog and lists of known issues, but it is still hard to predict situations like the reported above. For instance, we inferred that means and sds would not be displayed for ordinal variables, which is in line with our curricula, but we didn't anticipate that medians and IQRs would be affected.
Thank You
I'm not a native English speaker and I tried to keep the bug report concise and in line with the template. My apologies if I didn't sound thankful enough for all the amazing work you do with JASP and for distributing it under a FOSS License.
Expected Behaviour
- Range, Median, and IQR should be displayed in the descriptives table for ordinal variables
- Boxplots should be returned for ordinal variables
Steps to Reproduce
- ask for the Range, Mdn and IQR to be shown in the descriptives table
- ask for a boxplot for the same variable
- ask for the quartiles, min, and max, while Range, Mdn and IQR are not
Log (if any)
More Debug Information
- tested on Windows and Linux (flatpak)
Final Checklist
- [x] I have included a screenshot showcasing the issue, if possible.
- [x] I have included a JASP file (zipped) or data file that causes the crash/bug, if applicable.
- [x] I have accurately described the bug, and steps to reproduce it.
Hi @Joao-O-Santos ,
Thank you for your detailed report - we are currently discussing the behavior that was recently implemented, which aimed to have less confusing output by not doing anything the user hasn't asked for. The current behavior is such that numeric results are only computed for scale variables, so if you want these statistics, you can convert your variables to "scale" (either in the data view, or by left-clicking the icons in the descriptives assigned variables box). However, we think the behavior could be improved and are working on a more intuitive solution.
Cheers, Johnny
Hi, @JohnnyDoorn !
Thank you for taking the time to read the issue and for your prompt reply.
we think the behavior could be improved and are working on a more intuitive solution.
Thank you for being understanding and for the work you and the team do to make JASP awesome.
you can convert your variables to "scale"
We were aware of the workaround and had already implemented it. I think for expert users it's not a big deal if the case is documented. Still, I'm afraid it can be confusing for novices specially students who learn that median, range and IQR are appropriate for ordinal variables and then cannot compute those descriptives unless they convert the variable to scale.
The current behavior is such that numeric results are only computed for scale variables
I see your point and I appreciate that you're already working on improving this feature. Just note that you're already computing the median "50% percentile", and the other quartiles (25% and 75%), as well as the minimum and maximum. Providing the median on its own line in the table, the IQR (75%|Q3 - 25%|Q1), or the range (Max - Min) does not really require you to report more "numeric results". I'm not asking for the mean, standard deviation, etc... be computed for ordinal variables, nor that median and IQR be reported for nominal variables, as computing and reporting that would not be "by the book".