ad_map data: var ('cell_type') added after the training in the merfish data frame, but it's 'NA'
Hi, I tried to run Tangram for cell type annotation. But don't know how Tangram works for cell type projection from reference data. I checked the ad_map, there's a var ('cell_type') added after the training in the merfish data frame, but it's 'NA' under this 'cell_type' column. Can I ask you how to get the cell type annotation for my merfish data for each cluster? Thansk very much
comb_adata; merfish data adata_sc; reference
tg.pp_adatas(adata_sc, comb_adata, genes=None);
assert "training_genes" in adata_sc.uns assert "training_genes" in comb_adata.uns print(f"Number of training_genes: {len(adata_sc.uns['training_genes'])}");
ad_map = tg.map_cells_to_space( adata_sc, comb_adata, mode="cells", cluster_label='leiden', density_prior='rna_count_based', num_epochs=100, device='cpu', );
tg.project_cell_annotations(ad_map, comb_adata, annotation="cell_type") annotation_list = list(pd.unique(adata_sc.obs['cell_type'])) tg.plot_cell_annotation_sc(comb_adata, annotation_list,perc=0.02,spot_size=50);
Then check ad_map: AnnData object with n_obs × n_vars = 79667 × 4515 obs: 'age', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'n_genes', 'n_counts', 'clust_annot', 'organism_ontology_term_id', 'sex_ontology_term_id', 'suspension_type', 'cell_type_ontology_term_id', 'assay_ontology_term_id', 'tissue_ontology_term_id', 'disease_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'donor_id', 'is_primary_data', 'cell_type_annot', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid', 'leiden' var: 'region', 'slide', 'cell_id', 'area', 'sample_id', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_10_genes', 'pct_counts_in_top_20_genes', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_150_genes', 'n_counts', 'leiden', 'uniform_density', 'rna_count_based_density', 'cell_type' uns: 'train_genes_df', 'training_history'
Your cell type annotations need to be in a column of your adata_sc.obs (single cell data). Based on your code, I assume they are in the adata.obs['leiden'] column.
You should not have cluster_label = leiden in your call to map_cells_to_space. This is only used if you are doing mode = cluster, not mode = cells. Instead, you would put annotation = leiden in project_cell_annotations. This will add your cell type annotations to comb_adata.obsm['tangram_ct_pred'].
Also, if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based.
You should revise your code to be something closer to this (red=delete, blue=add):
tg.pp_adatas(adata_sc, comb_adata, genes=None);
assert "training_genes" in adata_sc.uns
assert "training_genes" in comb_adata.uns
print(f"Number of training_genes: {len(adata_sc.uns['training_genes'])}");
ad_map = tg.map_cells_to_space(
adata_sc,
comb_adata,
mode="cells",
- cluster_label='leiden',
- density_prior='rna_count_based',
+ density_prior='uniform',
num_epochs=100,
device='cpu',
);
- tg.project_cell_annotations(ad_map, comb_adata, annotation="cell_type")
+ tg.project_cell_annotations(ad_map, comb_adata, annotation="leiden")
- annotation_list = list(pd.unique(adata_sc.obs['cell_type']))
+ annotation_list = list(pd.unique(adata_sc.obs['leiden']))
tg.plot_cell_annotation_sc(comb_adata, annotation_list, perc=0.02, spot_size=50);
See the Tangram Jupyter notebooks for more detailed information. FYI, 100 epochs will likely not be sufficient for convergence (probably try ~500 or so).
Thanks @wakelin-g. "if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based." Can you clarify to me, MERFISH/Xenium datasets will use density_prior = uniform, then how about sequencing based spatial datasets like Visium (HD), Slide-seq? Thanks very much! I will revise the script and re-run it, and keep you updated.
Thanks @wakelin-g. "if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based." Can you clarify to me, MERFISH/Xenium datasets will use density_prior = uniform, then how about sequencing based spatial datasets like Visium (HD), Slide-seq? Thanks very much! I will revise the script and re-run it, and keep you updated.
For technologies which have single-cell resolution (MERFISH, Xenium, Visium HD, ..., etc.), use uniform. For technologies where multiple cells are likely contained within a single spatial element (i.e., Visium non-HD), use rna_count_based.
Again, you should take a look at the notebook which explains what density_prior actually does and what the different options mean.
Dear @wakelin-g, I did a new tets using the code from your guidance. If I use 'leiden', then I can see the plot_cell_annotation_sc only show the leiden cluster number (the figure below), there should be cell type labeing directly? I think I still should use annotation="cell_type" when tg.project_cell_annotations? Thanks so much!
If I change back to 'cell_type', it looks correct?
Then the predicted cell type is in the obsm.tangram_ct_pred
Can I ask you my original leiden clusters in adata1 are 24, but in the tangram_ct_pred only 12 clusters, Can I double check with you: This is because the reference only has those annotated 12 cell types, thus it can't match the original leiden clusters?