ecotyper icon indicating copy to clipboard operation
ecotyper copied to clipboard

Problem step 8

Open ccruizm opened this issue 2 years ago • 4 comments

Good day!

Thanks for developing this wonderful tool. I am trying to use it on my own scRNA-seq data, and did a trial run before, and work nicely. However, with the whole dataset and including relevant metadata, the pipeline stops at step 8. Apparently, there is a problem reading the colnames on my annotation file to add that info into the heatmap plot. Seems to be sort of related to the issue https://github.com/digitalcytometry/ecotyper/issues/2

Below is the error:

Step 8 (ecotype discovery)...
Step 8 (ecotype discovery) finished successfully!
Error in `[.data.frame`(top_annotation, , x) : undefined columns selected
Calls: heatmap_simple ... unlist -> lapply -> FUN -> unique -> [ -> [.data.frame

I have tested it in the latest version of Ecotyper (today I downloaded the master repository and gave it a try, but the same error).

This is what my yml file looks like:

default :
  Input :    
    Discovery dataset name : discovery_scRNA
    Expression matrix : /home/cruiz2/subset.txt    
    Annotation file : /home/cruiz2/annotation_subset.txt 
    Annotation file column to scale by : NULL
    Annotation file column(s) to plot : ["Sample","Diagnosis","Tumor_type","Tumor_subtype","Source"]
    
  Output :
    Output folder : DiscoveryOutput_scRNA

  Pipeline settings :
    #Pipeline steps:
    #   step 1 (extract cell type specific genes)
    #   step 2 (cell state discovery on correrlation matrices)
    #   step 3 (choosing the number of cell states)
    #   step 4 (extracting cell state information)
    #   step 5 (cell state re-discovery in expression matrices)
    #   step 6 (extracting information for re-discovered cell states)
    #   step 7 (cell state QC filter)
    #   step 8 (ecotype discovery)
    Pipeline steps to skip : [1,2,3,4,5,6,7] 
    Filter non cell type specific genes : True
    Number of threads : 20
    Number of NMF restarts : 50
    Maximum number of states per cell type : 20
    Cophenetic coefficient cutoff : 0.95
    #The p-value cutoff used for filtering non-significant overlaps in the jaccard matrix used for 
    #discovering ecotypes in step 8. Default: 1 (no filtering).
    Jaccard matrix p-value cutoff : 0.05

What do you think the issue might be?

Thanks in advance!

ccruizm avatar May 20 '22 08:05 ccruizm

An additional question, would you recommend including only protein-coding genes for the discovery of ecotypes? I am using a single-nuclei RNA dataset which has a higher content of non-coding genes and they seem to dominate some of the states. Any experience with that?

Thanks!

ccruizm avatar May 20 '22 12:05 ccruizm

Here is my issue.

  1. My analysis is based on tutorial-5 (scRNA discovery). git cloned around 01/2022. It worked!
  2. Shoot! I forgot to add extra columns to display the information in the heatmaps. I repeated the same job by adding the columns. Failed!
  3. I assigned a new 'discovery dataset name' but failed at step 8 with the same error message posted by ccruizm.
  4. I pulled the ecotyper as of 5/27/22 from github. Same issue!

Can you describe what the step 8 (two scripts are involved, ecotypes_scRNA.R and ecotypes_assign_samples_scRNA.R. and ) is in detail in terms of input requirements?

cjhong avatar May 27 '22 18:05 cjhong

Hi,

Thank you both for your interest in EcoTyper and for reporting the issue. We have been able to identify the issue and fix the crash. The problem arose because the ecotype discovery step in scRNA-seq data involves aggregating cell information within each sample. Therefore, the output is at sample-level, whereas the annotation is at cell-level. To keep things simple, EcoTyper will now plot any additional columns that have the same value across the single cells within each sample, and will ignore columns for which there are discordant values.

Regarding your other questions: @ccruizm: Yes, subsetting to protein coding genes and re-CPM-ing the filtered data could help identifying more biologically relevant states. We did use this approach for Carcinoma EcoTyper.

@cjhong: Both scripts take as input the assignments of single cells to cell states. The former identifies ecotypes by studying the co-occurrence of cell states across samples, whereas the latter is quantifying the ecotype presence across the samples in the discovery dataset.

Please let us know if you encounter any additional issues.

Best, The EcoTyper team

BALuca avatar May 27 '22 22:05 BALuca

Thank you very much for your reply and clarify the doubts I had.

I have tested the files you modified (EcoTyper_discovery_scRNA.R, ecotypes_assign_samples_scRNA.R,ecotypes_recovery_scRNA.R), and they do add the metadata info in the Ecotype heatmap plot (all states together), but it does not work per each cell type.

I ran the pipeline only from step 8. Would I need to run it from the beginning to make it work?

Thans again!

ccruizm avatar May 28 '22 09:05 ccruizm