STAARpipeline-Tutorial icon indicating copy to clipboard operation
STAARpipeline-Tutorial copied to clipboard

Controls / cases counts inverted when using binary model

Open GACGAMA opened this issue 8 months ago • 7 comments

Hello!

I've finished analyzing a large cohort using 0/1 numerical values for control/case status respectively. Then I went to my vcf to count the distributions of variants for each group. But what I saw was the contrary of what I expected, I found things that seemed enriched in controls instead of case. Is that the expected order for the enrichment?

I observed that when looking for the percentage comparing controls and cases, the results are both ways - some seemed enriched for controls (most of them) and some for cases, so I'm not sure how to interpret this

Should I repeat by inversing 0/1 as case/control respectively to find things enriched in cases?

One example:

gene_step gene Cases_Sum_nHet Control_Sum_nHet Total X more in controls
MUC4_ODMS_WGS_WES_MUC4_synonymous MUC4 261 2829 10.83908046
METTL1_ODMS_WGS_METTL1_promoter_CAGE METTL1 4 0 0

To count, I summed the genotype calls for each group (case and control) and summarized it for each gene-category pair. As you can see, I have a lot more total Heterozygous calls for controls in one example and cases in another

GACGAMA avatar Jun 20 '24 16:06 GACGAMA