ecotyper icon indicating copy to clipboard operation
ecotyper copied to clipboard

Tutorial 4 - Warning: Only 0 genes are available

Open VerenaPass opened this issue 2 years ago • 14 comments

Dear Ecotyper team,

I tried to run Tutorial 4 - de novo discovery of Cell State and Ecotypes in bulk expression data - with my own bulk RNAseq data. After setting up Ecotyper and the additional resources needed I did a test run with your example data and everything worked smoothly.

However, when I tried to use my own dataset I got the following warnings and error:

`Step 3 (cell state discovery): Preparing the NMF input... Warning: Only 0 genes are available for 'Dendritic.cells'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'B.cells'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'Mast.cells'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'Endothelial.cells'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'Monocytes.and.Macrophages'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'Neutrophils'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'NK.cells'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'Fibroblasts'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'Plasma.cells'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'T.cells.CD4'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'T.cells.CD8'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'Tregs'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Warning: Only 0 genes are available for 'T.cells.follicular.helper'. At least 50 genes are required for cell state discovery. Skipping cell state discovery for this cell type! Step 3 (cell state discovery): Running NMF (Warning: This step might take a long time!)... Step 3 (cell state discovery): Aggregating NMF results... Step 3 (cell state discovery) finished successfully!

Step 4 (choosing the number of cell states)... Error in split.default(all_data, as.character(all_data$CellType)) : first argument must be a vector Calls: split -> split.default Execution halted Error in RunJobQueue() : EcoTyper failed. Please check the error message above! Execution halted`

I used the same input datasets (expression matrix and annotations) to run Tutorial 1 and it worked without problem. So I am just wondering if there's a problem in my input data or if maybe my data are not suitable for this analysis? Do you have any idea?

Thanks a lot,

Verena

VerenaPass avatar May 10 '22 08:05 VerenaPass

Hi Verena,

Thank you for your interest in EcoTyper. How many samples do you have in your input data? It could be that there are too few samples for the CIBERSORTx High Resolution step, which imputes the cell-type gene expression profiles. It works best when there is a much larger number of samples than cell types to deconvolve. A rule of thumb is to have at least 3-4 fold more samples than cell types, although the more the better.

cbsteen avatar May 16 '22 20:05 cbsteen

Dear Chloé, thanks a lot for your reply and for your help. I have 134 samples in my input data and when I run CIBERSORTx (the Impute cell fraction module) in the web interface with this data it works without problems. Best, Verena

VerenaPass avatar May 17 '22 07:05 VerenaPass

Dear Verena, Thanks for the clarification. I agree that 134 samples should be enough. Another possibility is that the CIBERSORTx crashes when trying to perform in-silico purification (step 2), therefore it does not find any genes at step 3. Were there any errors/warning during step 2? Also, what is the content of folder: CIBERSORTx/hires/<your_discovery_dataset>/*/? Best, The EcoTyper team

BALuca avatar May 17 '22 20:05 BALuca

Dear EcoTyper team,

regarding your first question, here's the screen printout for step2, it seems there are no warnings/errors:

Step 2 (cell type expression purification): Running CIBERSORTxHiRes... Reading fractions from folder: ../CIBERSORTx/fractions/discovery/Gallium_134pt/Lymphoma_Fractions

Running CIBERSORTx high-resolution GEP imputation... [Options] username: xx [Options] token: xx [Options] mixture: /src/data/input.txt [Options] sigmatrix: /src/data/classes.txt [Options] classes: /src/data/classes.txt [Options] cibresults: /src/data/cibresults.txt [Options] heatmap: FALSE [Options] threads: 1 Running CIBERSORTx high-resolution GEP imputation... [Options] username: xx [Options] token: xx [Options] mixture: /src/data/input.txt [Options] sigmatrix: /src/data/classes.txt [Options] classes: /src/data/classes.txt [Options] cibresults: /src/data/cibresults.txt [Options] heatmap: FALSE [Options] threads: 1 Running CIBERSORTx high-resolution GEP imputation... [Options] username: xx [Options] token: xx [Options] mixture: /src/data/input.txt [Options] sigmatrix: /src/data/classes.txt [Options] classes: /src/data/classes.txt [Options] cibresults: /src/data/cibresults.txt [Options] heatmap: FALSE [Options] threads: 1 Running CIBERSORTx high-resolution GEP imputation... [Options] username: xx [Options] token: xx [Options] mixture: /src/data/input.txt [Options] sigmatrix: /src/data/classes.txt [Options] classes: /src/data/classes.txt [Options] cibresults: /src/data/cibresults.txt [Options] heatmap: FALSE [Options] threads: 1 Loaded 134 mixture samples, 3665 genes, and 13 cell subsets... Window size adaptively set to 52 Imputing high-resolution cell type GEPs...done. Loaded 134 mixture samples, 3666 genes, and 13 cell subsets... Window size adaptively set to 52 Imputing high-resolution cell type GEPs...done. Loaded 134 mixture samples, 3665 genes, and 13 cell subsets... Window size adaptively set to 52 Loaded 134 mixture samples, 3666 genes, and 13 cell subsets... Window size adaptively set to 52 Imputing high-resolution cell type GEPs...done. Imputing high-resolution cell type GEPs...done. Writing output to disk ...done. Running time (sec): 79 Writing output to disk ...done. Running time (sec): 79 Writing output to disk ...done. Running time (sec): 80 Writing output to disk ...done. Running time (sec): 80 Step 2 (cell type expression purification): Aggregating CIBERSORTxHiRes results... [1] "Preparing CIBERSORTxHiRes results for: Fibroblasts" [1] "Preparing CIBERSORTxHiRes results for: Endothelial.cells" [1] "Preparing CIBERSORTxHiRes results for: B.cells" [1] "Preparing CIBERSORTxHiRes results for: Dendritic.cells" [1] "Preparing CIBERSORTxHiRes results for: Mast.cells" [1] "Preparing CIBERSORTxHiRes results for: Monocytes.and.Macrophages" [1] "Preparing CIBERSORTxHiRes results for: NK.cells" [1] "Preparing CIBERSORTxHiRes results for: Neutrophils" [1] "Preparing CIBERSORTxHiRes results for: Plasma.cells" [1] "Preparing CIBERSORTxHiRes results for: T.cells.CD4" [1] "Preparing CIBERSORTxHiRes results for: T.cells.CD8" [1] "Preparing CIBERSORTxHiRes results for: T.cells.follicular.helper" [1] "Preparing CIBERSORTxHiRes results for: Tregs" Step 2 (cell type expression purification) finished successfully!

And regarding your second question, that's the content of folder CIBERSORTx/hires/MyData/Lymphoma_Fractions/

B.cells.dimensions.txt Plasma.cells.dimensions.txt B.cells.txt
Plasma.cells.txt Dendritic.cells.dimensions.txt T.cells.CD4.dimensions.txt Dendritic.cells.txt
T.cells.CD4.txt Endothelial.cells.dimensions.txt T.cells.CD8.dimensions.txt Endothelial.cells.txt T.cells.CD8.txt Fibroblasts.dimensions.txt T.cells.follicular.helper.dimensions.txt Fibroblasts.txt T.cells.follicular.helper.txt Mast.cells.dimensions.txt Tregs.dimensions.txt Mast.cells.txt
Tregs.txt Monocytes.and.Macrophages.dimensions.txt cibresults.txt Monocytes.and.Macrophages.txt classes.txt NK.cells.dimensions.txt worker_1 NK.cells.txt
worker_2 Neutrophils.dimensions.txt worker_3 Neutrophils.txt worker_4

However, the majority of the files are empty. For example, the file "B.cells.txt" contains only the column names and nothing else.

Let me know if you need additional details, I can also share the files with you. Thanks!

Verena

VerenaPass avatar May 18 '22 08:05 VerenaPass

Hi Verena,

It looks like you are not providing the entire transcriptome as input (which should be ~20,000 genes or more), but rather only a subset of 3,665 genes. This could explain the results you are getting at the CIBERSORTx High Resolution step. If you provide the entire gene expression dataset, without pre-filtering of genes, that should probably solve the issue.

Best, The EcoTyper team

cbsteen avatar May 18 '22 12:05 cbsteen

Dear EcoTyper team,

thanks for your answer. That's weird because my input data contains almost 15,000 genes. Unfortunately I think that even without pre-filtering I won't have many more genes. So maybe 15,000 genes are not enough to run Ecotyper?

Best,

Verena

VerenaPass avatar May 18 '22 12:05 VerenaPass

Hi Verena,

15,000 genes should definitely be enough, and judging by the output you pasted above, step 2 ran "successfully". However, it is puzzling that all cell types fail to deconvolve. I have never seen this behavior before. One possibility is that the cell fractions used by the CIBERSORTx high resolution module to impute cell-type specific expression are completely off. Is it biologically reasonable to expect the cell populations from Lymphoma EcoTyper to be present in your data? If yes, did you provide your own cell fraction estimates in the "Cell type fractions" field of the configuration files, or did you set it to "Lymphoma_Fractions"? If former, I would do some sanity checks to make sure that the sample IDs did not get scrambled, and that the cell fractions make biological sense. If latter, we could have a look at this - would it be possible to send the input data and the configuration file you used to [email protected]?

Best, The EcoTyper team

BALuca avatar May 27 '22 22:05 BALuca

Dear Ecotyper team,

It’s been a while since the last message, but I finally found out what was the problem with my data. I think the tool didn’t like that the column names of my matrix where actually numbers (I had a numeric ID for my patients). After I changed that everything run smoothly.

The error message is somehow misleading so I am posting here just in case someone has the same issue at some point.

Best,

Verena

VerenaPass avatar Jul 19 '22 09:07 VerenaPass

Dear Verena,

I was trying to run the tutorial 4 and fortunately I found your experience post here. I wonder if you may give me some advice that where you got the docker token (cibersortx) that required in the tutorial 4's config file, as I find nothing to do (how to request a token...) in the cibersortx online Download page. I have the two docker images installed, however, but the tokens are not issued yet.

Any information would be helpful, thank you!

Cheng

johnsCheng avatar Jul 24 '22 14:07 johnsCheng

Hi Cheng,

if you log in to the cibersortx website and you select Menu - Tokens there should be a button "Request Token" and once you get one you will find it there below the button.

Hope this helps!

Best,

Verena

VerenaPass avatar Jul 24 '22 14:07 VerenaPass

Verena,

Thanks for your kind and timely answer. Though I followed the Sign in - Menu - Tokens instruction and enter the page (https://cibersortx.stanford.edu/getoken.php,07/25/2022), It appeared there is not a 'Requested Token' 🔘 but a highlighted words

Token and instruction access pending approval Please wait patiently as we consider your request.

It confuses me as I guess I may just wait and see. It did help, thank you!

Best regards, Cheng

johnsCheng avatar Jul 25 '22 06:07 johnsCheng

Dear Ecotyper team,

Hello. Thank you for implementing the Ecotyper pipeline.

I am having a similar problem to the one posted by Verena. The difference is that in my results only the "CD4.T.cells.txt" file contains only column names and nothing else. The other results contain enough data to complete step 8 (ecotype discovery) with no CD4.T.cells.columns.

I think the cause of this problem may have occurred during the adoption of CIBERSORTx hires.

Is there any reason, other than differences in input format, why an empty file is created upon adoption?

Best,

So Takata

sotakata avatar Aug 29 '22 07:08 sotakata

@VerenaPass I came across the same issue. What's your sample name like after you changed them?

diyang1354 avatar Sep 22 '22 10:09 diyang1354

@diyang1354 I just rename my column names adding a letter in front. After changing them, it was something like "P12345".

VerenaPass avatar Sep 22 '22 11:09 VerenaPass