celltypist icon indicating copy to clipboard operation
celltypist copied to clipboard

celltypist before/after batch correction

Open malonzm1 opened this issue 9 months ago • 10 comments

Hi,

I perform batch correction using scVI. But I perform celltypist prediction before batch correction. Is it better to perform celltypist after batch correction or it doesn't matter?

Good day.

malonzm1 avatar May 02 '24 00:05 malonzm1

@malonzm1, predicted_labels is only dependent on gene expression matrix, but majority_voting will be influenced by the neighborhood graph if it is constructed from scVI latent space.

ChuanXu1 avatar May 03 '24 09:05 ChuanXu1

Thanks!

malonzm1 avatar May 03 '24 23:05 malonzm1

Is majority_voting more reliable if celltypist is run after batch correction?

malonzm1 avatar May 08 '24 07:05 malonzm1

@malonzm1, depends, but majority_voting is usually more readable.

ChuanXu1 avatar May 08 '24 22:05 ChuanXu1

@ChuanXu1 Based on what you've described, it seems that batch effects will not impact the predicted_labels, but they can influence the majority_voting results??? After applying harmony to remove batch effects, my data also encountered the issue of "Invalid expression matrix in .X, expect log1p normalized expression to 10000 counts per cell; will use .raw.X instead."

smallsmalltown avatar Jun 08 '24 23:06 smallsmalltown

@smallsmalltown, as I remember, Harmony will not change the expression values but produce only the corrected latent space. To predict your data using CellTypist, you need to provide a normalized gene expression in either .X or .raw.X.

ChuanXu1 avatar Jun 11 '24 20:06 ChuanXu1

@ChuanXu1 Can you explain more about the latent space idea and harmony?. If I integrated using harmony in R then converted my object to h5ad then provided celltypist with the normalized .X of it, what would be better predicted_labels or majority voting? will celltypist use the latent space of the samples at all?

Flu09 avatar Aug 08 '24 12:08 Flu09

@Flu09, celltypist does not use the latent space to predict cell types, namely, the predicted_labels is independent from the latent space. The majority_voting however may be impacted by the latent space as the majority voting result relies on the clustering, which is influenced by the latent space.

ChuanXu1 avatar Aug 08 '24 12:08 ChuanXu1

I see thank you but if i will combine two studies and i noticed that the overall counts in one study are fewer than the other. should the annotation by celltypist be done on each study alone.

Flu09 avatar Aug 17 '24 18:08 Flu09

@Flu09, it's safer to do this for each dataset separately to ensure sufficient gene overlap between your data and the model used.

ChuanXu1 avatar Aug 17 '24 23:08 ChuanXu1