smoove icon indicating copy to clipboard operation
smoove copied to clipboard

Filtering multisample with duphold

Open karynne7 opened this issue 3 years ago • 1 comments

Brent, do you have any suggestions for filtering a multisample smoove VCF for duphold? Just filtering the first sample will extraneously exclude some samples, and including any sites (FMT/DHFFC[*], e.g.) that have one sample passing this metric might be too inclusive? Just wondering if you have a better solution. Thanks!

karynne7 avatar Sep 21 '20 22:09 karynne7

yes. (I should start and FAQ for this...) you can use slivar for this. depending on what you want to do. if you want to call high quality de novo's in trios, you can use e.g.:

--trio 'hq_dn_del:kid.het && mom.hom_ref && dad.hom_ref && kid.DHFFC < 0.7 && mom.DHFFC > 0.8 && dad.DHFFC > 0.8 && INFO.SVTYPE == "DEL"'

likewise, for SVTYPE == DUP, you can use kid.DHFFC > 1.2 (and parents < 1.1 or something).

If you just want to see which samples have high-quality, you can use:

--sample-expr "hq_del_samples:sample.het && sample.DHFFC < 0.7 && INFO.SVTYPE == 'DEL'"

it's often useful to know if there are many samples that are low quality, in this case called as deletions, but without a change in DHFFC:

--sample-expr "lowq_del_samples:sample.het && sample.DHFFC > 0.75 && INFO.SVTYPE == 'DEL'"

so you can combine this type of annotation to find variants that are, for example high quality in one or more samples and low quality in zero or few samples.

That should get you started, let me know any other clarifications. (all of this is written as text, so might be small types, but should be fairly close).

brentp avatar Sep 21 '20 22:09 brentp