GLUE icon indicating copy to clipboard operation
GLUE copied to clipboard

Why Does the `integration_consistency` require raw counts?

Open ilan-gold opened this issue 2 years ago • 1 comments

See https://github.com/gao-lab/GLUE/blob/e20518b0ae7b39d47244087b825511e5c84f9b7e/scglue/data.py#L609 for the line from which integration_consistency makes this requirement. I am wondering why this is the case? I have floating point data I am trying to integrate.

ilan-gold avatar Jun 14 '22 10:06 ilan-gold

Well this situation a bit awkward. The integration_consistency involves computing feature-feature correlation. For the most common RNA & ATAC data, we need to preprocess the count data in a particular way (total count normalization + log transformation) to ensure that the correlations make sense.

For other data types it is not as clear what preprocessing is necessary. I guess the preprocessing part should ideally be modularized so users can also insert their own preprocessing function. I'll try to add that in the next release!

Jeff1995 avatar Jun 14 '22 14:06 Jeff1995

Sorry for the late response!

Starting from v0.3.0, scglue.models.integration_consistency now works with non raw-count data. Nevertheless, you may need to specify appropriate "metacell" aggregation and preprocessing functions via arguments "agg_fns" and "prep_fns" (passed along to scglue.data.metacell_corr).

By default, the function retains the previous behavior of raw-count normalization if the datasets were configured with "NB" or "ZINB" models.

Jeff1995 avatar Aug 27 '22 05:08 Jeff1995