Tangram icon indicating copy to clipboard operation
Tangram copied to clipboard

ad_map data: var ('cell_type') added after the training in the merfish data frame, but it's 'NA'

Open KunHHE opened this issue 11 months ago • 4 comments

Hi, I tried to run Tangram for cell type annotation. But don't know how Tangram works for cell type projection from reference data. I checked the ad_map, there's a var ('cell_type') added after the training in the merfish data frame, but it's 'NA' under this 'cell_type' column. Can I ask you how to get the cell type annotation for my merfish data for each cluster? Thansk very much

comb_adata; merfish data adata_sc; reference

tg.pp_adatas(adata_sc, comb_adata, genes=None);

assert "training_genes" in adata_sc.uns assert "training_genes" in comb_adata.uns print(f"Number of training_genes: {len(adata_sc.uns['training_genes'])}");

ad_map = tg.map_cells_to_space( adata_sc, comb_adata, mode="cells", cluster_label='leiden', density_prior='rna_count_based', num_epochs=100, device='cpu', );

tg.project_cell_annotations(ad_map, comb_adata, annotation="cell_type") annotation_list = list(pd.unique(adata_sc.obs['cell_type'])) tg.plot_cell_annotation_sc(comb_adata, annotation_list,perc=0.02,spot_size=50);

Then check ad_map: AnnData object with n_obs × n_vars = 79667 × 4515 obs: 'age', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'n_genes', 'n_counts', 'clust_annot', 'organism_ontology_term_id', 'sex_ontology_term_id', 'suspension_type', 'cell_type_ontology_term_id', 'assay_ontology_term_id', 'tissue_ontology_term_id', 'disease_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'development_stage_ontology_term_id', 'donor_id', 'is_primary_data', 'cell_type_annot', 'tissue_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'observation_joinid', 'leiden' var: 'region', 'slide', 'cell_id', 'area', 'sample_id', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_10_genes', 'pct_counts_in_top_20_genes', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_150_genes', 'n_counts', 'leiden', 'uniform_density', 'rna_count_based_density', 'cell_type' uns: 'train_genes_df', 'training_history'

KunHHE avatar Dec 31 '24 03:12 KunHHE

Your cell type annotations need to be in a column of your adata_sc.obs (single cell data). Based on your code, I assume they are in the adata.obs['leiden'] column.

You should not have cluster_label = leiden in your call to map_cells_to_space. This is only used if you are doing mode = cluster, not mode = cells. Instead, you would put annotation = leiden in project_cell_annotations. This will add your cell type annotations to comb_adata.obsm['tangram_ct_pred'].

Also, if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based.

You should revise your code to be something closer to this (red=delete, blue=add):

tg.pp_adatas(adata_sc, comb_adata, genes=None);

assert "training_genes" in adata_sc.uns
assert "training_genes" in comb_adata.uns
print(f"Number of training_genes: {len(adata_sc.uns['training_genes'])}");

ad_map = tg.map_cells_to_space(
adata_sc,
comb_adata,
mode="cells",
- cluster_label='leiden',
- density_prior='rna_count_based',
+ density_prior='uniform',
num_epochs=100,
device='cpu',
);

- tg.project_cell_annotations(ad_map, comb_adata, annotation="cell_type")
+ tg.project_cell_annotations(ad_map, comb_adata, annotation="leiden")
- annotation_list = list(pd.unique(adata_sc.obs['cell_type']))
+ annotation_list = list(pd.unique(adata_sc.obs['leiden']))
tg.plot_cell_annotation_sc(comb_adata, annotation_list, perc=0.02, spot_size=50);

See the Tangram Jupyter notebooks for more detailed information. FYI, 100 epochs will likely not be sufficient for convergence (probably try ~500 or so).

wakelin-g avatar Jan 03 '25 19:01 wakelin-g

Thanks @wakelin-g. "if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based." Can you clarify to me, MERFISH/Xenium datasets will use density_prior = uniform, then how about sequencing based spatial datasets like Visium (HD), Slide-seq? Thanks very much! I will revise the script and re-run it, and keep you updated.

KunHHE avatar Jan 03 '25 20:01 KunHHE

Thanks @wakelin-g. "if you are analysing MERFISH data, you should be using density_prior = uniform instead of density_prior = rna_count_based." Can you clarify to me, MERFISH/Xenium datasets will use density_prior = uniform, then how about sequencing based spatial datasets like Visium (HD), Slide-seq? Thanks very much! I will revise the script and re-run it, and keep you updated.

For technologies which have single-cell resolution (MERFISH, Xenium, Visium HD, ..., etc.), use uniform. For technologies where multiple cells are likely contained within a single spatial element (i.e., Visium non-HD), use rna_count_based.

Again, you should take a look at the notebook which explains what density_prior actually does and what the different options mean.

wakelin-g avatar Jan 03 '25 21:01 wakelin-g

Dear @wakelin-g, I did a new tets using the code from your guidance. If I use 'leiden', then I can see the plot_cell_annotation_sc only show the leiden cluster number (the figure below), there should be cell type labeing directly? I think I still should use annotation="cell_type" when tg.project_cell_annotations? Thanks so much!

image

If I change back to 'cell_type', it looks correct? image

Then the predicted cell type is in the obsm.tangram_ct_pred image

Can I ask you my original leiden clusters in adata1 are 24, but in the tangram_ct_pred only 12 clusters, Can I double check with you: This is because the reference only has those annotated 12 cell types, thus it can't match the original leiden clusters?

KunHHE avatar Jan 04 '25 11:01 KunHHE