Immune-Subtype-Clustering icon indicating copy to clipboard operation
Immune-Subtype-Clustering copied to clipboard

Calling immune subtypes on new data: normalising expression matrix

Open u909090 opened this issue 6 years ago • 2 comments

Hi,

Not really an issue, but more a question.

In the walkthrough 'Call_immune_subtypes_on_new_data', is it robust enough to just join 2 independently normalised expression matrices (i.e. the EBPlusPlus and the new data), then log-transform and median-center the joined matrix? Would it not be better to start from the read counts with both datasets and normalise the whole joined dataset the same way as for EBPlusPlus?

u909090 avatar Jan 11 '19 13:01 u909090

(1) Maybe. Just depends on the 'new data'.

(2) Probably would be. But I don't think read count level data is available for the EB++ TCGA expression set.

Gibbsdavidl avatar Jan 18 '19 22:01 Gibbsdavidl

The EBPlusPlus dataset have been obtained from the Firehose portal, where they used MapSplice and RSEM to estimate the read counts (see details on this thread). The matrices for different cancer types were then normalised by setting the upper quartile of gene expression per sample to 1,000.

Since the raw RSEM read count are available from Firehose, maybe one could compute count estimates on new data, merge them with the TCGA ones, then normalise the ensemble.

u909090 avatar Jan 22 '19 13:01 u909090