monocle3 all FAIL when running fit

hi all

I ran monocle3 passing a Seurat object following the instructions here (https://satijalab.org/signac/articles/monocle.html).

then I wanted to run the DEG analysis but the returned data.frame looks as follows:

> gene_fits
# A tibble: 17,541 x 10
   vst.mean vst.variance vst.variance.expect… vst.variance.standard… vst.variable num_cells_express… gene_id     model     model_summary status
      <dbl>        <dbl>                <dbl>                  <dbl> <lgl>                     <int> <chr>       <named l> <named list>  <chr> 
 1 0.586        0.704                0.890                     0.791 FALSE                      1929 MRPL15      <speedgl… <smmry.sp>    FAIL  
 2 0.321        0.349                0.423                     0.824 FALSE                      1230 LYPLA1      <speedgl… <smmry.sp>    FAIL  
 3 0.000186     0.000186             0.000186                  1.00  FALSE                         1 GM37988     <speedgl… <smmry.sp>    FAIL  
 4 0.581        0.741                0.880                     0.842 FALSE                      1848 TCEA1       <speedgl… <smmry.sp>    FAIL  
 5 0.461        0.581                0.658                     0.882 FALSE                      1569 ATP6V1H     <speedgl… <smmry.sp>    FAIL  
 6 0.320        0.408                0.422                     0.968 FALSE                      1080 RB1CC1      <speedgl… <smmry.sp>    FAIL  
 7 0.0359       0.0387               0.0452                    0.856 FALSE                       142 4732440D04… <speedgl… <smmry.sp>    FAIL  
 8 0            0                    0                         0     FALSE                         0 ALKAL1      <speedgl… <smmry.sp>    FAIL  
 9 0.0102       0.0109               0.0130                    0.838 FALSE                        44 ST18        <speedgl… <smmry.sp>    FAIL  
10 0.274        0.363                0.352                     1.03  FALSE                       926 PCMTD1      <speedgl… <smmry.sp>    FAIL

where all status are FAIL

the code I used is the following:

sc_time <- subset(sc_recluster, cells = which([email protected]$seurat_clusters %in% c(0,1,2,4,6)))
DefaultAssay(sc_time) <- "RNA"
sc_time <- as.cell_data_set(sc_time)

sc_time <- preprocess_cds(sc_time)
plot_pc_variance_explained(sc_time)
sc_time <- reduce_dimension(sc_time,reduction_method = "UMAP")
sc_time <- cluster_cells(sc_time)
sc_time <- learn_graph(sc_time)
sc_time <- order_cells(sc_time)
# sc_time@principal_graph_aux@listData$UMAP$pseudotime
gene_fits <- fit_models(sc_time, model_formula_str = "~ pseudotime", cores=8)

also, I don't seem to have any infinite or NA values

> which(is.infinite(as.numeric(sc_time@principal_graph_aux@listData$UMAP$pseudotime)))
integer(0)
> which(is.na(as.numeric(sc_time@principal_graph_aux@listData$UMAP$pseudotime)))
integer(0)

when I look at the p.value for all models:

pvs <- lapply(gene_fits$model, function(xx){
  pv <- summary(xx)$coefficient[["Pr(>|t|)"]][[2]]
  return(pv)
})
table(unlist(pvs))
0.999999999825233 
            17541

looks like they are all ~1

if I look at the model for the first gene:

> gene_fits$model$MRPL15
Generalized Linear Model of class 'speedglm':

Call:  speedglm::speedglm(formula = model_formula, family = stats::quasipoisson(),      model = FALSE, y = FALSE, acc = 0.001) 

Coefficients:
(Intercept)   pseudotime  
 -2.730e+01    1.962e-12

and its summary:

> summary(gene_fits$model$MRPL15)
Generalized Linear Model of class 'speedglm':

Call:  speedglm::speedglm(formula = model_formula, family = stats::quasipoisson(),      model = FALSE, y = FALSE, acc = 0.001) 

Coefficients:
 ------------------------------------------------------------------ 
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.730e+01   0.039007  -699.9    0e+00 ***
pseudotime   1.962e-12   0.008958     0.0    1e+00    

------------------------------------------------------------------- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

---
null df: 4446; null deviance: 0;
residuals df: 4445; residuals deviance: 0;
# obs.: 4447; # non-zero weighted obs.: 4447;
AIC: NA; log Likelihood: NA;
RSS: 0; dispersion: 3.776833e-12; iterations: 25;
rank: 2; max tolerance: 2.12e-07; convergence: FALSE.

it did run although it was not significant.

I tried to run it separately as:

> glm_time <- speedglm::speedglm(to_plot$MRPL15 ~ to_plot$Pseudotime, 
+                 family = stats::quasipoisson(), 
+                 model = FALSE, 
+                 y = FALSE, 
+                 acc = 0.001)
> summary(glm_time)
Generalized Linear Model of class 'speedglm':

Call:  speedglm::speedglm(formula = to_plot$MRPL15 ~ to_plot$Pseudotime,      family = stats::quasipoisson(), model = FALSE, y = FALSE,      acc = 0.001) 

Coefficients:
 ------------------------------------------------------------------ 
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -0.91603   0.010297  -88.96  0.0e+00 ***
to_plot$Pseudotime -0.04808   0.002431  -19.78  1.7e-83 ***

------------------------------------------------------------------- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

---
null df: 4446; null deviance: 131.98;
residuals df: 4445; residuals deviance: 122.33;
# obs.: 4447; # non-zero weighted obs.: 4447;
AIC: NA; log Likelihood: NA;
RSS: 110.3; dispersion: 0.02482102; iterations: 4;
rank: 2; max tolerance: 1.81e-11; convergence: TRUE.
> glm_time
Generalized Linear Model of class 'speedglm':

Call:  speedglm::speedglm(formula = to_plot$MRPL15 ~ to_plot$Pseudotime,      family = stats::quasipoisson(), model = FALSE, y = FALSE,      acc = 0.001) 

Coefficients:
       (Intercept)  to_plot$Pseudotime  
          -0.91603            -0.04808

here it is marked as significant...not sure why thou.

any suggestion would be great, thank you

Apr 20 '21 13:04 SBata

Hey I just ran into a very similar problem. I used the as.cell_data_set function from the SeuratWrappers package to convert my Seurat object to a cell data set for pseudotime analysis. When I ran monocle3::fit_models(my_cds, model_formula_str = "~pseudotime"), all the p-values were either 0 or 1, and all the q-values were 1.

I had a hunch that as.cell_data_set maybe doesn't put the gene expression matrix in the slot where fit_models was looking for expression data. I don't know what the problem is exactly, but I found a workaround that seems to work. So instead of as.cell_data_set, I converted the Seruat object with the following code snippet:


# normalized expression matrix from Seurat object
expr_matrix  <- Seurat::GetAssayData(my_seurat, assay='RNA', slot='data')

# create cell dataset, don't worry about the warning
my_cds <- monocle3::new_cell_data_set(
  expr_matrix,
  [email protected]
)

# process dataset 
my_cds <- cluster_cells(cds=my_cds, reduction_method='UMAP')
my_cds <- learn_graph(my_cds, use_partition=TRUE)

# use the helper function in the monocle3 documentation
# https://cole-trapnell-lab.github.io/monocle3/docs/trajectories/
principal_node <- get_earliest_principal_node(my_cds,partition = 1)

# order cells in pseudotime 
my_cds <- order_cells(my_cds,root_pr_nodes = principal_node)

# finally run fit_models 
gene_fits <- monocle3::fit_models(my_cds, model_formula_str = "~pseudotime")
fit_coefs <- coefficient_table(gene_fits)
fit_coefs$term
fit_coefs %>% filter(term == "pseudotime" & q_value <= 0.05) %>%
      select(gene_id, term, q_value, estimate)

I hope that this workaround helps you or somebody else! I still don't really know what is wrong with using as.cell_data_set.

Jun 03 '21 00:06 smorabit

hi all

I ran monocle3 passing a Seurat object following the instructions here (https://satijalab.org/signac/articles/monocle.html).

then I wanted to run the DEG analysis but the returned data.frame looks as follows:

> gene_fits
# A tibble: 17,541 x 10
   vst.mean vst.variance vst.variance.expect… vst.variance.standard… vst.variable num_cells_express… gene_id     model     model_summary status
      <dbl>        <dbl>                <dbl>                  <dbl> <lgl>                     <int> <chr>       <named l> <named list>  <chr> 
 1 0.586        0.704                0.890                     0.791 FALSE                      1929 MRPL15      <speedgl… <smmry.sp>    FAIL  
 2 0.321        0.349                0.423                     0.824 FALSE                      1230 LYPLA1      <speedgl… <smmry.sp>    FAIL  
 3 0.000186     0.000186             0.000186                  1.00  FALSE                         1 GM37988     <speedgl… <smmry.sp>    FAIL  
 4 0.581        0.741                0.880                     0.842 FALSE                      1848 TCEA1       <speedgl… <smmry.sp>    FAIL  
 5 0.461        0.581                0.658                     0.882 FALSE                      1569 ATP6V1H     <speedgl… <smmry.sp>    FAIL  
 6 0.320        0.408                0.422                     0.968 FALSE                      1080 RB1CC1      <speedgl… <smmry.sp>    FAIL  
 7 0.0359       0.0387               0.0452                    0.856 FALSE                       142 4732440D04… <speedgl… <smmry.sp>    FAIL  
 8 0            0                    0                         0     FALSE                         0 ALKAL1      <speedgl… <smmry.sp>    FAIL  
 9 0.0102       0.0109               0.0130                    0.838 FALSE                        44 ST18        <speedgl… <smmry.sp>    FAIL  
10 0.274        0.363                0.352                     1.03  FALSE                       926 PCMTD1      <speedgl… <smmry.sp>    FAIL

where all status are FAIL

the code I used is the following:

sc_time <- subset(sc_recluster, cells = which([email protected]$seurat_clusters %in% c(0,1,2,4,6)))
DefaultAssay(sc_time) <- "RNA"
sc_time <- as.cell_data_set(sc_time)

sc_time <- preprocess_cds(sc_time)
plot_pc_variance_explained(sc_time)
sc_time <- reduce_dimension(sc_time,reduction_method = "UMAP")
sc_time <- cluster_cells(sc_time)
sc_time <- learn_graph(sc_time)
sc_time <- order_cells(sc_time)
# sc_time@principal_graph_aux@listData$UMAP$pseudotime
gene_fits <- fit_models(sc_time, model_formula_str = "~ pseudotime", cores=8)

also, I don't seem to have any infinite or NA values

> which(is.infinite(as.numeric(sc_time@principal_graph_aux@listData$UMAP$pseudotime)))
integer(0)
> which(is.na(as.numeric(sc_time@principal_graph_aux@listData$UMAP$pseudotime)))
integer(0)

when I look at the p.value for all models:

pvs <- lapply(gene_fits$model, function(xx){
  pv <- summary(xx)$coefficient[["Pr(>|t|)"]][[2]]
  return(pv)
})
table(unlist(pvs))
0.999999999825233 
            17541

looks like they are all ~1

if I look at the model for the first gene:

> gene_fits$model$MRPL15
Generalized Linear Model of class 'speedglm':

Call:  speedglm::speedglm(formula = model_formula, family = stats::quasipoisson(),      model = FALSE, y = FALSE, acc = 0.001) 

Coefficients:
(Intercept)   pseudotime  
 -2.730e+01    1.962e-12

and its summary:

> summary(gene_fits$model$MRPL15)
Generalized Linear Model of class 'speedglm':

Call:  speedglm::speedglm(formula = model_formula, family = stats::quasipoisson(),      model = FALSE, y = FALSE, acc = 0.001) 

Coefficients:
 ------------------------------------------------------------------ 
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.730e+01   0.039007  -699.9    0e+00 ***
pseudotime   1.962e-12   0.008958     0.0    1e+00    

------------------------------------------------------------------- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

---
null df: 4446; null deviance: 0;
residuals df: 4445; residuals deviance: 0;
# obs.: 4447; # non-zero weighted obs.: 4447;
AIC: NA; log Likelihood: NA;
RSS: 0; dispersion: 3.776833e-12; iterations: 25;
rank: 2; max tolerance: 2.12e-07; convergence: FALSE.

it did run although it was not significant.

I tried to run it separately as:

> glm_time <- speedglm::speedglm(to_plot$MRPL15 ~ to_plot$Pseudotime, 
+                 family = stats::quasipoisson(), 
+                 model = FALSE, 
+                 y = FALSE, 
+                 acc = 0.001)
> summary(glm_time)
Generalized Linear Model of class 'speedglm':

Call:  speedglm::speedglm(formula = to_plot$MRPL15 ~ to_plot$Pseudotime,      family = stats::quasipoisson(), model = FALSE, y = FALSE,      acc = 0.001) 

Coefficients:
 ------------------------------------------------------------------ 
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        -0.91603   0.010297  -88.96  0.0e+00 ***
to_plot$Pseudotime -0.04808   0.002431  -19.78  1.7e-83 ***

------------------------------------------------------------------- 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

---
null df: 4446; null deviance: 131.98;
residuals df: 4445; residuals deviance: 122.33;
# obs.: 4447; # non-zero weighted obs.: 4447;
AIC: NA; log Likelihood: NA;
RSS: 110.3; dispersion: 0.02482102; iterations: 4;
rank: 2; max tolerance: 1.81e-11; convergence: TRUE.
> glm_time
Generalized Linear Model of class 'speedglm':

Call:  speedglm::speedglm(formula = to_plot$MRPL15 ~ to_plot$Pseudotime,      family = stats::quasipoisson(), model = FALSE, y = FALSE,      acc = 0.001) 

Coefficients:
       (Intercept)  to_plot$Pseudotime  
          -0.91603            -0.04808

here it is marked as significant...not sure why thou.

any suggestion would be great, thank you

same problem here, have you ever figure it out?

Mar 30 '22 13:03 vicscott

monocle3
monocle3 copied to clipboard

all FAIL when running fit_model

monocle3 monocle3 copied to clipboard

all FAIL when running fit_model

monocle3
monocle3 copied to clipboard