gatk-sv icon indicating copy to clipboard operation
gatk-sv copied to clipboard

Remove outliers across per-contig VCFs

Open epiercehoffman opened this issue 3 months ago • 0 comments

Updates

New workflow to remove outlier samples.

  • Uses src/sv-pipeline/scripts/downstream_analysis_and_filtering/determine_svcount_outliers.R for plotting and outlier determination which only considers SV types with a median SVs per sample of at least 100
  • Takes per-contig VCFs as input
  • Only performs outlier determination based on autosomes
  • Can rerun with new inputs and settings to separately perform SV counting, outlier determination at different thresholds, and filtering without redoing previous steps
  • Includes bcftools preprocessing step to restrict SVs considered during outlier determination
  • Filters sample list
  • Can provide list of additional (ex. withdrawn) samples to exclude at the same time as outlier removal

Testing

Tested on 1kgp reference panel with different settings and inputs.

Marking as draft while development for Phase 2 is ongoing. Designed for Phase 2 usage so may need changes to be more generally applicable.

epiercehoffman avatar Mar 11 '24 18:03 epiercehoffman