gatk
gatk copied to clipboard
Add SVStratify and GroupedSVCluster tools
Implements two new tools and updates some methods for a revamp of the CombineBatches cross-batch integration module in gatk-sv.
SVStratify- tool for splitting out a VCF by variant class. Users pass in a configuration table (see tool documentation for an example) specifying one or more stratification groups classified by SVTYPE, SVLEN range, and reference context(s). The latter are specified as a set of interval lists using--context-nameand--context-intervalsarguments. All variants are matched with their respective group which is annotated in theSTRATINFO field. Optionally, the output can be split into multiple VCFs by group, which is a very useful functionality that currently can't be done efficiently with common commands/toolkits.GroupedSVCluster- a hybrid tool combining functionality fromSVStratifywithSVClusterto perform intra-stratum clustering. This tool is critical for fine-tuned clustering of specific variants types within certain reference contexts. For example, small variants in simple repeats tend to have lower breakpoint accuracy and are typically "reclustered" during call set refinement with looser clustering criteria.SVStratificationEngine- new class for performing stratification.- Updates to breakpoint refinement in
CanonicalSVCollapserthat should improve breakpoint accuracy, particularly in larger call sets. Raw evidence support and variant quality are now considered when choosing a representative breakpoint for a group of clustered SVs. - Added
FlagFieldLogictype for customizing howBOTHSIDE_PASSandHIGH_SR_BACKGROUNDINFO flags are collapsed during clustering. RD_CNis now used as a backup ifCNis not available when determining carrier status for sample overlap.- Removed no-sort option in favor of spooled sorting.
- Bug fix: support for empty EVIDENCE info fields
- Bug fix: in one of the JointGermlineCnvDefragmenter tests