epiercehoffman

Results 9 issues of epiercehoffman

### Updates Create a standalone WDL for NCR and reference artifact filters. Also removes sites with zero carriers. This script pulls code from the existing NCR calculation in apply_sl_filter.py but...

The CPX_TYPE for CTX records should be CTX_PP/QQ or CTX_PQ/QP. This is expected by downstream tools like ManualReview.wdl and SVAnnotate. In a recent callset, it was observed that the CPX_TYPE...

We want to add GD region overlap to the INFO field. It seems to make sense to do this in AnnotateVcf, probably within SVAnnotate. We could add a separate reference...

When svtk vcfcluster merges records, it creates lists of INFO values from the member records, even when the Number for the INFO key defined in the header is 1 ([link](https://github.com/broadinstitute/gatk-sv/blob/d37e453038e425acba9683da503218fe3d4b1033/src/svtk/svtk/svfile.py#L302))....

### Updates New workflow to remove outlier samples. * Uses `src/sv-pipeline/scripts/downstream_analysis_and_filtering/determine_svcount_outliers.R` for plotting and outlier determination which only considers SV types with a median SVs per sample of at least...

svtk vcf2bed uses the ALT field to produce the `svtype` column in the output BED file. This means that the `svtype` column includes BND alt alleles and values like INS:ME...

A recent large cohort had more CPX events > 1Mb than expected. The RD plots for the large CNVs within these complex SVs don't show visible changes in read depth....

Vapor produces a large amount of storage (~1.5 GiB per sample), 99% of which is the plots. These plots are not necessary to store for every sample, so to reduce...

Problem statement: We have a need for more standardized and productionized sample QC and filtration prior to running the raw SV callers in GatherSampleEvidence. Sample QC is performed in EvidenceQC,...