celltypist icon indicating copy to clipboard operation
celltypist copied to clipboard

Running the CellTypist training function celltypist.train on a subset of genes

Open dkapadia612 opened this issue 1 year ago • 1 comments

I would like to train a CellTypist model to identify certain cell types with a specific gene set. I tried feeding the function a list of genes using the 'genes' argument but it still trained using all features. Besides only keeping the select genes in the adata.var, are there any other approaches to make this work? Additionally, does training the model on <50 genes affect the accuracy of prediction of the trained celltypist model, or is there a threshold gene count below which you wouldn't recommend training a model? I would appreciate any help you can provide!

dkapadia612 avatar Feb 15 '24 16:02 dkapadia612

@dkapadia612, you can train the model using any numbers of genes. There is no definitive relationship between the accuracy of the model and the number of genes (for example, a dataset with clearly distinct cell types may only rely on a handful of genes). To train the model using a subset of genes, you can use model = celltypist.train(adata[:, a_subset_genes], check_expression = False, ...)

ChuanXu1 avatar Feb 20 '24 12:02 ChuanXu1