ecotyper
ecotyper copied to clipboard
Problem step 8
Good day!
Thanks for developing this wonderful tool. I am trying to use it on my own scRNA-seq data, and did a trial run before, and work nicely. However, with the whole dataset and including relevant metadata, the pipeline stops at step 8. Apparently, there is a problem reading the colnames on my annotation file to add that info into the heatmap plot. Seems to be sort of related to the issue https://github.com/digitalcytometry/ecotyper/issues/2
Below is the error:
Step 8 (ecotype discovery)...
Step 8 (ecotype discovery) finished successfully!
Error in `[.data.frame`(top_annotation, , x) : undefined columns selected
Calls: heatmap_simple ... unlist -> lapply -> FUN -> unique -> [ -> [.data.frame
I have tested it in the latest version of Ecotyper (today I downloaded the master repository and gave it a try, but the same error).
This is what my yml file looks like:
default :
Input :
Discovery dataset name : discovery_scRNA
Expression matrix : /home/cruiz2/subset.txt
Annotation file : /home/cruiz2/annotation_subset.txt
Annotation file column to scale by : NULL
Annotation file column(s) to plot : ["Sample","Diagnosis","Tumor_type","Tumor_subtype","Source"]
Output :
Output folder : DiscoveryOutput_scRNA
Pipeline settings :
#Pipeline steps:
# step 1 (extract cell type specific genes)
# step 2 (cell state discovery on correrlation matrices)
# step 3 (choosing the number of cell states)
# step 4 (extracting cell state information)
# step 5 (cell state re-discovery in expression matrices)
# step 6 (extracting information for re-discovered cell states)
# step 7 (cell state QC filter)
# step 8 (ecotype discovery)
Pipeline steps to skip : [1,2,3,4,5,6,7]
Filter non cell type specific genes : True
Number of threads : 20
Number of NMF restarts : 50
Maximum number of states per cell type : 20
Cophenetic coefficient cutoff : 0.95
#The p-value cutoff used for filtering non-significant overlaps in the jaccard matrix used for
#discovering ecotypes in step 8. Default: 1 (no filtering).
Jaccard matrix p-value cutoff : 0.05
What do you think the issue might be?
Thanks in advance!
An additional question, would you recommend including only protein-coding genes for the discovery of ecotypes? I am using a single-nuclei RNA dataset which has a higher content of non-coding genes and they seem to dominate some of the states. Any experience with that?
Thanks!
Here is my issue.
- My analysis is based on tutorial-5 (scRNA discovery). git cloned around 01/2022. It worked!
- Shoot! I forgot to add extra columns to display the information in the heatmaps. I repeated the same job by adding the columns. Failed!
- I assigned a new 'discovery dataset name' but failed at step 8 with the same error message posted by ccruizm.
- I pulled the ecotyper as of 5/27/22 from github. Same issue!
Can you describe what the step 8 (two scripts are involved, ecotypes_scRNA.R and ecotypes_assign_samples_scRNA.R. and ) is in detail in terms of input requirements?
Hi,
Thank you both for your interest in EcoTyper and for reporting the issue. We have been able to identify the issue and fix the crash. The problem arose because the ecotype discovery step in scRNA-seq data involves aggregating cell information within each sample. Therefore, the output is at sample-level, whereas the annotation is at cell-level. To keep things simple, EcoTyper will now plot any additional columns that have the same value across the single cells within each sample, and will ignore columns for which there are discordant values.
Regarding your other questions: @ccruizm: Yes, subsetting to protein coding genes and re-CPM-ing the filtered data could help identifying more biologically relevant states. We did use this approach for Carcinoma EcoTyper.
@cjhong: Both scripts take as input the assignments of single cells to cell states. The former identifies ecotypes by studying the co-occurrence of cell states across samples, whereas the latter is quantifying the ecotype presence across the samples in the discovery dataset.
Please let us know if you encounter any additional issues.
Best, The EcoTyper team
Thank you very much for your reply and clarify the doubts I had.
I have tested the files you modified (EcoTyper_discovery_scRNA.R, ecotypes_assign_samples_scRNA.R,ecotypes_recovery_scRNA.R), and they do add the metadata info in the Ecotype heatmap plot (all states together), but it does not work per each cell type.
I ran the pipeline only from step 8. Would I need to run it from the beginning to make it work?
Thans again!