Tangram icon indicating copy to clipboard operation
Tangram copied to clipboard

Question about training score in reference dataset

Open rimelof opened this issue 1 year ago • 1 comments

Hello,

I am followind the tutorial but using a single nucleus dataset as reference, and trying to map it to a visium sample. It seems I am finding a very poor training score on the reference dataset (see attached figure). Is there some way I could improve that?

Thank you for creating and maintaining the package.

training_scores

rimelof avatar Jul 11 '22 18:07 rimelof

HI,

Thanks for your question. Actually from the training scores, it looks like Tangram works well in this case. In the Visium ST data, a lot of genes suffer from measurement drop-out, which makes the spatial distribution of those dropped out genes to be sparse.

In the Panel 3 of this figure, you will see that the sparser the gene is (the more likely the measurement of that gene from certain spots have been dropped out), the lower the score we observe. Actually what Tangram is doing is that it helps to correct the actual gene distribution of those genes that suffer from dropout.

In order to have a better training score, you can choose only those marker genes with good spatial distribution (we normally choose several hundreds to one or two thousands marker genes in total) and hold out the rest of the overlapped genes as test set. Then for the test scores, you will see exactly the same trend as Panel 3.

Hejin0701 avatar Aug 09 '22 16:08 Hejin0701