slivar icon indicating copy to clipboard operation
slivar copied to clipboard

compound heterozygote tooling

Open brentp opened this issue 1 year ago • 3 comments

This is to track a new tool for more exhaustive support for compound heterozygotes. The current tool supports probably 90% of use-cases.

Other uses include:

  1. detecting non-standard compound variants (not heterozygotes) #138
  2. annotating with information from co-occuring variants from gnomad pairs
  3. accepting a SNP and SV (or STR) VCF to find heterozgotes in SNP and SV or SNP and STR

other uses-cases and feedback welcome.

Plan

Plan is to build a new tools slivar ch.

slivar expr \
   --sample-expr 'parent_ch:sample.GQ > 20 && (sample.hom_ref || (sample.het && sample.AB > 0.2 && sample.AB < 0.8)) && INFO.impactful && sample.kids.length > 0' \
   --sample-expr 'kid_ch:sample.GQ > 20 && sample.het && INFO.impactful'

# annotate worthy SVs
slivar expr -v $other_SV_vcf -o $other_ch_vcf \
   --sample-expr 'parent_ch:...' \
   --sample-expr 'kid_ch:...' 

slivar ch \
    --ped $ped \
    --parent parent_ch \
    --kid kid_ch \
    --groupby 'CSQ/gene' \
    -v $input \
    --other_vcf $other_ch_vcf \
    -o $output

slivar ch extracts only variants with kid_ch and checks that parent_ch is present in exactly one parent when grouped by groupby. This requires that the SV VCF has a CSQ that matches that from bcftools or snpEff or VEP.

This can cover use-cases in #138 as the user can specify a --sample-expr 'parent_ch:...' that allows for homozygous variants in the parent.

slivar ch must only check for the parents that one variant is hom-ref and the other is not (don't check het).

brentp avatar Sep 13 '22 18:09 brentp

For 3. consider two SVs/STRs in the same gene as well.

hdashnow avatar Sep 13 '22 19:09 hdashnow

I believe current slivar comphet strategies rely on both of the variants being "damaging" for inclusion, but when dealing with SVs/STRs this might need to be relaxed since many VCFs of these variants do not include such impact predictions. Plus, as an example, while likely not categorized as "damaging", an intronic SV combined with a "damaging" SNV might be a combination worth considering.

fakedrtom avatar Sep 13 '22 19:09 fakedrtom

Current slivar compound-hets is actually agnostic to lenient about impact, you can set --skip to empty to include all possible types, and then you could include pairs of intronic variants if you so choose. The logic for SVs will be such that any included SV present in the sample will be considered.

brentp avatar Sep 13 '22 20:09 brentp