gatk icon indicating copy to clipboard operation
gatk copied to clipboard

FilterIntervals will get rid of Y chromosome intervals if there are >50% of female samples

Open NotAPoetButACriminal opened this issue 11 months ago • 2 comments

Bug Report

FilterIntervals

gatk FilterIntervals
-L ${OUTPUT}/bins.interval_list
--annotated-intervals ${OUTPUT}/bins_annotated.interval_list
-imr OVERLAPPING_ONLY
$INPUTHDF5S
-O ${OUTPUT}/bins_filtered.interval_list

Description

I've been running the gCNV pipeline as per this article on WES samples and have noticed that in some of my runs all of the Y chromosome contigs are being removed. This then messes with sex estimation during ploidy determination which further messes up the cnv calls on sex chromosomes. Correct me if I'm wrong, but it seems that the low count filter ie "intervals with a count < 10 in > 50.0% of samples fail" will remove the Y chromosome from any batch of samples where more than half of them are female. Pushing the percentage up (e.g. 55%, 60% etc.) to where it catches up with the percentage of samples that are female can remove this problem, but it will also change the interval filtering parameters for all other contigs. It seems that there should be a special consideration for sex chromosomes, for example stating "--allosomal-contig Y" like when using DetermineGermlineContigPloidy, or an always keep intervals option, like the -XL flag just in reverse.

NotAPoetButACriminal avatar Nov 14 '24 10:11 NotAPoetButACriminal