STAARpipeline-Tutorial
STAARpipeline-Tutorial copied to clipboard
Controls / cases counts inverted when using binary model
Hello!
I've finished analyzing a large cohort using 0/1 numerical values for control/case status respectively. Then I went to my vcf to count the distributions of variants for each group. But what I saw was the contrary of what I expected, I found things that seemed enriched in controls instead of case. Is that the expected order for the enrichment?
I observed that when looking for the percentage comparing controls and cases, the results are both ways - some seemed enriched for controls (most of them) and some for cases, so I'm not sure how to interpret this
Should I repeat by inversing 0/1 as case/control respectively to find things enriched in cases?
One example:
gene_step | gene | Cases_Sum_nHet | Control_Sum_nHet | Total X more in controls |
---|---|---|---|---|
MUC4_ODMS_WGS_WES_MUC4_synonymous | MUC4 | 261 | 2829 | 10.83908046 |
METTL1_ODMS_WGS_METTL1_promoter_CAGE | METTL1 | 4 | 0 | 0 |
To count, I summed the genotype calls for each group (case and control) and summarized it for each gene-category pair. As you can see, I have a lot more total Heterozygous calls for controls in one example and cases in another