sleuth
sleuth copied to clipboard
[Suggestion] Comparisons should be based on the input order of the conditions, not alphabetically
By default, sleuth orders the conditions alphabetically, which may cause some confusion when the user is inspecting the betavalues of the output.
For example, in a experimental design with one control (condition = 'control') and two treatments (condition = 'A' and condition = 'Z'), a transcript whose abundance changes in the same direction in both treatments will have values of beta with opposing signals.
As this may easily lead to misinterpretation of the results, shouln't sleuth order the factors by order of input by default?
This problem was previously discussed here: https://support.bioconductor.org/p/85657/
I think the comparisons are made not alphabetically but in the order of the factors in the design matrix. The factor order may be controlled by using relevel.
I can confirm that the alphabetical order determines the signal of the beta values. These are the results of the same test that I performed twice, just changing the name of the condition to change the alphabetical order:
target_id pval qval b
0610005C13Rik 0.01275389 0.2220374 0.21015090
0610006L08Rik NA NA NA
0610009B22Rik 0.89458192 0.9959706 0.01634498
0610009E02Rik 0.16955278 0.7857634 -1.37401449
0610009L18Rik NA NA NA
0610009O20Rik 0.43332932 0.9493924 0.07570488
target_id pval qval b
0610005C13Rik 0.01275389 0.2220374 -0.21015090
0610006L08Rik NA NA NA
0610009B22Rik 0.89458192 0.9959706 -0.01634498
0610009E02Rik 0.16955278 0.7857634 1.37401449
0610009L18Rik NA NA NA
0610009O20Rik 0.43332932 0.9493924 -0.07570488
I agree with this. Often an experiment will be "knockout" vs. "wild-type". In the sample-to-covariates table this is conveniently notated as "KO" and "WT." Unfortunately, that takes "KO" as the reference level unless the user intervenes with relevel. Would be easier to use the order of input in the 's2c' table, taking the first appearing level as reference.
We can consider how to implement this within sleuth. In the meantime, the solution for all of you (@zefrieira, @tiagobrc, @apcamargo, and @mchimenti) is to do what was suggested above, and convert the relevant covariate column to factor and use relevel to set the control label.
Example below, with conditions "KO" and "WT", and "WT" is the desired control condition:
example_s2c <- data.frame(sample = paste0("sample_", 1:8), condition = c(rep("WT", 4), rep("KO", 4)))
example_s2c
# sample condition
# 1 sample_1 WT
# 2 sample_2 WT
# 3 sample_3 WT
# 4 sample_4 WT
# 5 sample_5 KO
# 6 sample_6 KO
# 7 sample_7 KO
# 8 sample_8 KO
example_s2c$condition <- as.factor(example_s2c$condition)
example_s2c$condition <- relevel(example_s2c$condition, "WT")
## Do the rest of the sleuth pipeline