lemur
lemur copied to clipboard
>2 conditions
very nice work and well-functioning code! can i test >2 conditions (untreated, treatment A, treatment B) jointly or shall i test pairs of conditions? thanks!
Hey,
I would recommend fitting a single model (e.g., design = ~ condition), and then in the contrast, you will need to compare pairs of conditions (e.g., test_de(..., contrast = cond(condition = "treatmentA") - cond(condition = "untreated"))). Or what kind of test did you have in mind to do the test jointly?
Best, Constantin
thank you so much! i wasn't thinking about any test specifically. my question mainly arose because after fitting and performing DE on pairs of conditions the way you said, i get stuck at finding de neighborhoods:
neighborhoods <- find_de_neighborhoods(fit, independent_matrix = counts(sce), group_by = vars(sample), include_complement = FALSE, verbose = FALSE)
my "sample" has 3 possible values (untreated, treatment A, treatment B). i get this error:
Error in handle_design_parameter(design, data, col_data, reference_level): The model_matrix has more columns (3) than the there are samples in the data matrix (3 columns). Too few replicates / too many coefficients to fit model. The head of the design matrix: Intercept sampleA sampleB control 1 0 0 A 1 1 0 B 1 0 1 Traceback:
- find_de_neighborhoods(fit, independent_matrix = counts(sce), . group_by = vars(sample), include_complement = FALSE, verbose = FALSE)
- neighborhood_count_test(de_regions, counts = independent_matrix, . group_by = group_by, contrast = { . { . contrast . } . }, design = design, col_data = fit$colData, method = count_test_method, . verbose = verbose)
- glmGamPoi::glm_gp(region_psce, design = design, use_assay = "masked_counts", . verbose = verbose, offset = log(assay(region_psce, "masked_size_factors") + . 1e-10), size_factors = FALSE, overdispersion = TRUE)
- handle_design_parameter(design, data, col_data, reference_level)
- stop("The model_matrix has more columns (", ncol(model_matrix), . ") than the there are samples in the data matrix (", n_samples, . " columns).\n", "Too few replicates / too many coefficients to fit model.\n", . "The head of the design matrix: \n", format_matrix(head(model_matrix, . n = 3)))
thanks again!
my question mainly arose because after fitting and performing DE on pairs of conditions the way you said, i get stuck at finding de neighborhoods:
Ah, I see. Do you have replicates for the condition? The find find_de_neighborhoods requires multiple independent samples for each condition so that it can estimate the variability.
Hi,
many thanks for developing this very interesting tool!
I would kindly like to ask you for help! I am trying to analyse scRNA seq data from CD8 T cells sampled from three different conditions at two timepoints (early and late) with two independent samples per timepoint and condition. The timepoints are sampling expansion and contraction phase of CD8 T cells after infection and will thereby yield a gradient of various CD8 T cell differentiation states. I would like to compare gene expression differences for differential state analysis between the experimental conditions at either the early or late timepoint but also compare the experimental conditions across the two timepoints to identify condition and timepoint specific DEGs. Are two independent samples per timepoint and condition enough to run lemur? If so, could you provide some guidance in setting up contrast and group_by to run test_de and find_de_neighborhoods, respectively? Many thanks for any help! :)
Are two independent samples per timepoint and condition enough to run lemur?
The answer of course depends on the magnitude of the expression changes. But if you have enough time points two replicates per timepoint, I would guess, should be sufficient to find some interesting genes.
If so, could you provide some guidance in setting up contrast and group_by to run test_de and find_de_neighborhoods, respectively? Many thanks for any help! :)
# Fit interaction model
fit <- lemur(sce, design = ~ condition * timepoint, n_embedding = 15)
fit <- align_harmony(fit)
# Test effect of time in ctrl condition
fit <- test_de(fit, contrast = cond(condition = "ctrl", timepoint = 0) - cond(condition = "ctrl", timepoint = 7), new_assay_name = "DE_time_in_ctrl")
nei_1 <- find_de_neighborhoods(fit, group_by = vars(sample_id, condition, timepoint), de_mat = SummarizedExperiment::assays(fit)[["DE_time_in_ctrl"]])
# Test effect of time in treated condition
fit <- test_de(fit, contrast = cond(condition = "treated", timepoint = 0) - cond(condition = "treated", timepoint = 7), new_assay_name = "DE_time_in_treated")
nei_2 <- find_de_neighborhoods(fit, group_by = vars(sample_id, condition, timepoint), de_mat = SummarizedExperiment::assays(fit)[["DE_time_in_treated"]])
# Test effect of treatment at timepoint = 0
fit <- test_de(fit, contrast = cond(condition = "ctrl", timepoint = 0) - cond(condition = "treated", timepoint = 0), new_assay_name = "DE_ctrl_vs_treated_at_t0")
nei_3 <- find_de_neighborhoods(fit, group_by = vars(sample_id, condition, timepoint), de_mat = SummarizedExperiment::assays(fit)[["DE_ctrl_vs_treated_at_t0"]])
To learn more about the details of how design matrices work, see this article by Law et al..
Many thanks for your fast and helpful response!