gatk-sv icon indicating copy to clipboard operation
gatk-sv copied to clipboard

Outlier Sample Removal in Module 2 and 3

Open hab45 opened this issue 3 years ago • 0 comments

We have found certainly outlier samples with significantly more variants can influence variant metrics (module 2) and subsequent filtering (module 3). The challenge here is that given some family based studies have more tolerance for inclusion of these sample we are suggesting allowing the option of filtering out outliers for training purposes in module 2 and 3 but including them in the subsequent downstream steps. This could work as follows: At the end of module 1 generate a list of outlier samples akin to that performed after module 3 though likely more stringent. Exclude the samples from generation of variant metrics in module 2 unless a variant is comprised of only outlier samples. In that case generate variant metrics as normal. Exclude variants comprised of only outlier samples for training purposes in module 3 but include them when using the random forest for assessment.

hab45 avatar Sep 08 '20 19:09 hab45