preNivolumabOnNivolumab
preNivolumabOnNivolumab copied to clipboard
Question about negative control genes
Hello! Thank you for this very interesting re-analysis!
I have a question about the removal of unwanted variation. You chose your negative control genes empirically (as suggested by the RUV paper, I believe) by assuming that all genes with p > 0.1 are not different between the groups.
My concern is that, since we know that a large p-value is not, in fact, evidence of a lack of effect, how do we prevent this method from washing out a bunch of real between-group variation. If it is the case that there are considerable between-group differences, wouldn't we expect a potentially large subset of the non-significant genes to reflect those differences, even if they are too noisy to achieve a small p-value? And if we assume those genes represent the null condition, couldn't that have the effect of erroneously normalizing away the larger/clearer differences that do achieve a small p-value?
In the absence of a spike-in control, is there any way to know that the "negative control" genes one has chosen are indeed negative controls, and not simply noisy-but-positive genes?
Here is how I conceive of the impacts of this decision:
- we are not affirming these are all null, but that this is the subspace where we are enriched with null
- true biological differences in this region (assuming adequate power) will be smaller in effect than the ones in the other region, so even when we include some alternatives (H_A) it's ok because they won't dominate the low rank variation
If it is the case that there are considerable between-group differences
so you are describing an already under-powered situation, because large effects are often getting p > 0.1. In this case, yes I think this procedure will soak up both technical variation and under-powered-for-detection biological variation
But what we are much more concerned with is the opposite situation: high powered but calling technical variation as DE because we ignored it and pretended all the samples were i.i.d.
I was thinking more of a case where between-group differences are numerous rather than large, but I take your point regardless that this is the more conservative approach. Thanks for the response!
Agree.