abagen
abagen copied to clipboard
Perform donor normalization before probe selection?
The issue
In the current workflow for abagen.get_expression_data()
there are a few calls to the probes.filter_probes()
and probes.collapse_probes()
functions, both of which use microarray expression data from donors to select "good" quality probes to be retained in future processing. These functions—though primarily the latter—operate on expression values from all tissues samples pooled across donors, performing some numerical calculations like averaging, calculation of variance, PCA, etc. However, the expression data passed to these functions are "raw"; that is, no within-donor normalization procedures have been performed. Thus, donor-specific effects present in the raw microarray data could potentially be biasing the results of these functions.
My question is: should we perform within-donor normalization (i.e., normalizing each probe across all samples from a given donor) prior to providing the microarray data to the probe filtering/selection functions?
We would still use the "raw" microarray data when pooling across tissue samples in a given brain region and perform a separate within-donor normalization towards the end of the workflow, prior to aggregating across donors, but this pre-probe-selection donor normalization might help ensure that results aren't biased by donor effects. The flip side to this is that it's possible that performing the normalization procedure prior to removing "noisy" probes might result in biased expression values...
Proposed solution
Unclear! I don't think this warrants yet another parameter for the function, so it would be good to make a decision and simply implement it one way or the other. It's worth noting that adding this donor normalization step will likely change which probes are selected, so it will be a relatively large impact on the downstream expression matrix.