gatk-sv
gatk-sv copied to clipboard
Remove outliers across per-contig VCFs
Updates
New workflow to remove outlier samples.
- Uses
src/sv-pipeline/scripts/downstream_analysis_and_filtering/determine_svcount_outliers.R
for plotting and outlier determination which only considers SV types with a median SVs per sample of at least 100 - Takes per-contig VCFs as input
- Only performs outlier determination based on autosomes
- Can rerun with new inputs and settings to separately perform SV counting, outlier determination at different thresholds, and filtering without redoing previous steps
- Includes bcftools preprocessing step to restrict SVs considered during outlier determination
- Filters sample list
- Can provide list of additional (ex. withdrawn) samples to exclude at the same time as outlier removal
Testing
Tested on 1kgp reference panel with different settings and inputs.
Marking as draft while development for Phase 2 is ongoing. Designed for Phase 2 usage so may need changes to be more generally applicable.