CellO icon indicating copy to clipboard operation
CellO copied to clipboard

KeyError in scanpy_cello(); most specific cell type not created

Open phjanssen opened this issue 2 years ago • 2 comments

Hi, Thank you for developing this very useful tool! We encountered an error while trying to classify our cells with a pre-trained model. The prediction itself seems to work and we get the binary and probability output for the ontology terms added to the adata object if we set term_ids=True, however the selection of the 'most specific cell type' fails with an KeyError and the conversion to readable terms does also not work (no output at all if term_ids=False). Any ideas why this could happen and how to fix it? Thanks in advance, Laura and Philipp

The command:

cello.scanpy_cello(
    adata, 
    'clusters',
    cello_resource_loc, 
    model_file=f'{model_prefix}.model.dill',
    term_ids=True
)

Output:

Found CellO resources at '/data/home/EBgrant/scRNA_run1/analysis/Laura/CellO/resources'.

Variable names are not unique. To make them unique, call `.var_names_make_unique`.

Transforming with PCA...
done.
Making predictions for each classifier...
Running solver on item 1/19...
Running solver on item 2/19...
Running solver on item 3/19...
Running solver on item 4/19...
Running solver on item 5/19...
Running solver on item 6/19...
Running solver on item 7/19...
Running solver on item 8/19...
Running solver on item 9/19...
Running solver on item 10/19...
Running solver on item 11/19...
Running solver on item 12/19...
Running solver on item 13/19...
Running solver on item 14/19...
Running solver on item 15/19...
Running solver on item 16/19...
Running solver on item 17/19...
Running solver on item 18/19...
Running solver on item 19/19...
Checking if any pre-trained model is compatible with this input dataset...

/opt/anaconda3/envs/scRNAseq/lib/python3.6/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator PCA from version 0.22.2.post1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)
/opt/anaconda3/envs/scRNAseq/lib/python3.6/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator LogisticRegression from version 0.22.2.post1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.
  UserWarning)

Of 24458 genes in the input file, 19107 were found in the training set of 58243 genes.
Of 24458 genes in the input file, 18496 were found in the training set of 31283 genes.
Using thresholds stored in /data/home/EBgrant/scRNA_run1/analysis/Laura/CellO/resources/trained_models/ir.10x_genes_thresholds.tsv
Binarizing classifications...
Mapping each sample to its predicted labels...
Computing the most-specific predicted labels...
Item 1 predicted to be "somatic cell (CL:0002371)"
Item 2 predicted to be "somatic cell (CL:0002371)"
Item 3 predicted to be "somatic cell (CL:0002371)"
Item 4 predicted to be "neuron associated cell (CL:0000095)"
Item 5 predicted to be "astrocyte (CL:0000127)"
Item 6 predicted to be "astrocyte (CL:0000127)"
Item 7 predicted to be "somatic cell (CL:0002371)"
Item 8 predicted to be "CNS neuron (sensu Vertebrata) (CL:0000117)"
Item 9 predicted to be "astrocyte (CL:0000127)"
Item 10 predicted to be "hepatocyte (CL:0000182)"
Item 11 predicted to be "neurecto-epithelial cell (CL:0000710)"
Item 12 predicted to be "neural cell (CL:0002319)"
Item 13 predicted to be "astrocyte (CL:0000127)"
Item 14 predicted to be "astrocyte (CL:0000127)"
Item 15 predicted to be "squamous epithelial cell (CL:0000076)"
Item 16 predicted to be "astrocyte (CL:0000127)"
Item 18 predicted to be "neuron associated cell (CL:0000095)"
Item 19 predicted to be "endothelial cell of umbilical vein (CL:0002618)"

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-28-1600827178e8> in <module>
      7     cello_resource_loc,
      8     model_file=f'{model_prefix}.model.dill',
----> 9     term_ids=True
     10 )

/opt/anaconda3/envs/scRNAseq/lib/python3.6/site-packages/cello/scanpy_cello.py in cello(adata, clust_key, rsrc_loc, algo, out_prefix, model_file, log_dir, term_ids, remove_anatomical_subterms)
    206         adata.obs['Most specific cell type'] = [
    207             ou.cell_ontology().id_to_term[c].name
--> 208             for c in ms_results_df['most_specific_cell_type']
    209         ]
    210     else:

/opt/anaconda3/envs/scRNAseq/lib/python3.6/site-packages/cello/scanpy_cello.py in <listcomp>(.0)
    206         adata.obs['Most specific cell type'] = [
    207             ou.cell_ontology().id_to_term[c].name
--> 208             for c in ms_results_df['most_specific_cell_type']
    209         ]
    210     else:

KeyError: ''

phjanssen avatar Dec 01 '21 16:12 phjanssen