Tangram icon indicating copy to clipboard operation
Tangram copied to clipboard

SC data assumptions -- raw counts

Open johnyaku opened this issue 2 years ago • 3 comments

My understanding is that the assumption behind Tangram is that the same biological processes generated both the SC data and the ST data, and that ideally both data sets will come from the same sample.

Is Tangram expecting raw counts for the SC data? Or will it still work if the count data is normalized in some way?

More generally, how well can we expect Tangram to work if the SC reference data is from other, biologically similar samples? Or even a composite SC reference built by integrating multiple samples? In this last case we would normally do SCT Transform / Harmony to integrate samples, which is a kind of normalization ...

I guess we can always try it and see, but interested in your thoughts.

Thanks :)

johnyaku avatar Mar 30 '22 04:03 johnyaku

Yes data from SC and ST should be as similar as possible. So use raw counts for both, or normalized for both.

TBH, we found that with some loss functions (e.g. the classical Tangram cosine similarity loss) it works also if one is raw and the other is log (but I wouldn't do it by rule)

lewlin avatar May 15 '22 00:05 lewlin

Thanks for the feedback. We are building an atlas of SC data from multiple patients, caputred in different batches, etc. Ideally we'd like to use this atlas dataset as a reference for Tangram. Given a choice between leaving all counts raw (and disregarding patient/batch effects) or normalizing/integrating the SC data and then applying a similar normalization to the ST data, which do you recommend?

johnyaku avatar May 15 '22 21:05 johnyaku

I would use raw data and cluster method for integration!

lewlin avatar Jun 18 '22 17:06 lewlin