sleuth [Suggestion] Comparisons should be based on the input order of the conditions, not alphabetically

By default, sleuth orders the conditions alphabetically, which may cause some confusion when the user is inspecting the betavalues of the output.

For example, in a experimental design with one control (condition = 'control') and two treatments (condition = 'A' and condition = 'Z'), a transcript whose abundance changes in the same direction in both treatments will have values of beta with opposing signals.

As this may easily lead to misinterpretation of the results, shouln't sleuth order the factors by order of input by default?

This problem was previously discussed here: https://support.bioconductor.org/p/85657/

Jun 23 '18 15:06 zefrieira

I think the comparisons are made not alphabetically but in the order of the factors in the design matrix. The factor order may be controlled by using relevel.

Jun 24 '18 21:06 tiagobrc

I can confirm that the alphabetical order determines the signal of the beta values. These are the results of the same test that I performed twice, just changing the name of the condition to change the alphabetical order:

    target_id       pval      qval           b
0610005C13Rik 0.01275389 0.2220374  0.21015090
0610006L08Rik         NA        NA          NA
0610009B22Rik 0.89458192 0.9959706  0.01634498
0610009E02Rik 0.16955278 0.7857634 -1.37401449
0610009L18Rik         NA        NA          NA
0610009O20Rik 0.43332932 0.9493924  0.07570488

    target_id       pval      qval           b
0610005C13Rik 0.01275389 0.2220374 -0.21015090
0610006L08Rik         NA        NA          NA
0610009B22Rik 0.89458192 0.9959706 -0.01634498
0610009E02Rik 0.16955278 0.7857634  1.37401449
0610009L18Rik         NA        NA          NA
0610009O20Rik 0.43332932 0.9493924 -0.07570488

Jun 26 '18 17:06 apcamargo

I agree with this. Often an experiment will be "knockout" vs. "wild-type". In the sample-to-covariates table this is conveniently notated as "KO" and "WT." Unfortunately, that takes "KO" as the reference level unless the user intervenes with relevel. Would be easier to use the order of input in the 's2c' table, taking the first appearing level as reference.

Jul 16 '18 19:07 mchimenti

We can consider how to implement this within sleuth. In the meantime, the solution for all of you (@zefrieira, @tiagobrc, @apcamargo, and @mchimenti) is to do what was suggested above, and convert the relevant covariate column to factor and use relevel to set the control label.

Example below, with conditions "KO" and "WT", and "WT" is the desired control condition:

example_s2c <- data.frame(sample = paste0("sample_", 1:8), condition = c(rep("WT", 4), rep("KO", 4)))
example_s2c
#     sample condition
# 1 sample_1        WT
# 2 sample_2        WT
# 3 sample_3        WT
# 4 sample_4        WT
# 5 sample_5        KO
# 6 sample_6        KO
# 7 sample_7        KO
# 8 sample_8        KO

example_s2c$condition <- as.factor(example_s2c$condition)
example_s2c$condition <- relevel(example_s2c$condition, "WT")
## Do the rest of the sleuth pipeline

Jul 18 '18 15:07 warrenmcg

sleuth sleuth copied to clipboard

[Suggestion] Comparisons should be based on the input order of the conditions, not alphabetically

sleuth
sleuth copied to clipboard