preNivolumabOnNivolumab icon indicating copy to clipboard operation
preNivolumabOnNivolumab copied to clipboard

Question about negative control genes

Open JCSzamosi opened this issue 2 years ago • 3 comments

Hello! Thank you for this very interesting re-analysis!

I have a question about the removal of unwanted variation. You chose your negative control genes empirically (as suggested by the RUV paper, I believe) by assuming that all genes with p > 0.1 are not different between the groups.

My concern is that, since we know that a large p-value is not, in fact, evidence of a lack of effect, how do we prevent this method from washing out a bunch of real between-group variation. If it is the case that there are considerable between-group differences, wouldn't we expect a potentially large subset of the non-significant genes to reflect those differences, even if they are too noisy to achieve a small p-value? And if we assume those genes represent the null condition, couldn't that have the effect of erroneously normalizing away the larger/clearer differences that do achieve a small p-value?

In the absence of a spike-in control, is there any way to know that the "negative control" genes one has chosen are indeed negative controls, and not simply noisy-but-positive genes?

JCSzamosi avatar Mar 17 '23 16:03 JCSzamosi

Here is how I conceive of the impacts of this decision:

  • we are not affirming these are all null, but that this is the subspace where we are enriched with null
  • true biological differences in this region (assuming adequate power) will be smaller in effect than the ones in the other region, so even when we include some alternatives (H_A) it's ok because they won't dominate the low rank variation

If it is the case that there are considerable between-group differences

so you are describing an already under-powered situation, because large effects are often getting p > 0.1. In this case, yes I think this procedure will soak up both technical variation and under-powered-for-detection biological variation

But what we are much more concerned with is the opposite situation: high powered but calling technical variation as DE because we ignored it and pretended all the samples were i.i.d.

mikelove avatar Mar 17 '23 17:03 mikelove

I was thinking more of a case where between-group differences are numerous rather than large, but I take your point regardless that this is the more conservative approach. Thanks for the response!

JCSzamosi avatar Apr 12 '23 14:04 JCSzamosi

Agree.

mikelove avatar Apr 12 '23 15:04 mikelove