smoove
smoove copied to clipboard
Filtering multisample with duphold
Brent, do you have any suggestions for filtering a multisample smoove VCF for duphold? Just filtering the first sample will extraneously exclude some samples, and including any sites (FMT/DHFFC[*], e.g.) that have one sample passing this metric might be too inclusive? Just wondering if you have a better solution. Thanks!
yes. (I should start and FAQ for this...) you can use slivar for this. depending on what you want to do. if you want to call high quality de novo's in trios, you can use e.g.:
--trio 'hq_dn_del:kid.het && mom.hom_ref && dad.hom_ref && kid.DHFFC < 0.7 && mom.DHFFC > 0.8 && dad.DHFFC > 0.8 && INFO.SVTYPE == "DEL"'
likewise, for SVTYPE == DUP, you can use kid.DHFFC > 1.2 (and parents < 1.1 or something).
If you just want to see which samples have high-quality, you can use:
--sample-expr "hq_del_samples:sample.het && sample.DHFFC < 0.7 && INFO.SVTYPE == 'DEL'"
it's often useful to know if there are many samples that are low quality, in this case called as deletions, but without a change in DHFFC:
--sample-expr "lowq_del_samples:sample.het && sample.DHFFC > 0.75 && INFO.SVTYPE == 'DEL'"
so you can combine this type of annotation to find variants that are, for example high quality in one or more samples and low quality in zero or few samples.
That should get you started, let me know any other clarifications. (all of this is written as text, so might be small types, but should be fairly close).