brentp/rare-disease-wf: (WIP) best-practices workflow for rare disease

For rare-disease, the best practices and expected number of candidate variants for each inheritance mode are known. The actual filtering is easily done with a tool like slivar. This is a necessary first step with the following limitations:

it leaves an analyst or clinician with choices on how to prioritize the 10-15 candidates variants or ~100 for autosomal (non de novo) dominant.
- This is quite a small number, but the prioritization after this is highly variable across tools and analysts.
it is limited text/spreadsheet output
it assumes a high-quality, jointly-called VCF is already available
it leaves the analyst with the chore of getting IGV set up, and browsing each candidate for each family.

Quickstart

Note, it is early days for the project. It will produce high-quality SNP/indel candidates but you may need experience with nextflow to run it easily.

This project currently has workflow that can be run as:

# NOTE that you need to remove everything after \ on each line for the command to work
# the comments here are just for documentation purposes.
nextflow run -resume -profile slurm rare-disease.nf \
    -config nextflow.config \    # a starting config is included in this repo. adjust from there.
    --xams "/path/to/*/*.cram" \ # NOTE that this is a string glob
    --ped $pedigree_file \       # see: https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format
    --fasta $reference_fasta \
    --gff $gff \                   # e.g. from: ftp://ftp.ensembl.org/pub/current_gff3/homo_sapiens/
    --slivarzip gnomad.hg38.zip  \  # from: https://github.com/brentp/slivar#gnotation-files
    --cohort_name my_rare_disease

Output

See this wiki page for more information about how to use the output.

This does:

Run DeepVariant and GLNexus (we have shown these tools to give higher quality results for trios) in an efficient nextflow workflow that can be easily run in the cloud or on a cluster.
Decompose and normalize variants.
Annotate with bcftools csq and snpEff
Annotate with allele frequency and inheritance modes using slivar
Annotate with gene-based annotations:
- clinvar-gene-phenotype
- loss-of-function intolerance
Output high-quality calls from slivar for recessive, dominant, x-linked, compound-het and other inheritance modes.
Generates and links pre-made, standalone igv.js/jigv outputs for each candidate.

And the key output will be in: results-rare-disease/${cohort_name}.slivar.candidates.tsv which is something one can easily view in excel or other spreadsheet software. In addition, it will create: results-rare-disease/${cohort_name}.jigv.html and results-rare-disease/jigv_plots/* which together provide an HTML table and interactive igv.js views of each variant and associated alignments that do not rely on the original alignment files.

In coming releases, this will:

Output QC with somalier and other tools to be shown in multiQC
Output high-quality SVs (using manta-> graphtyper)

Octopus

currently, octopus is included as a separate workflow. This octopus.nf pipeline will detect trios and families and run them together and then iteratively merge across families using the n+1 schema described in the octopus docs Finally, the workflow will do the forest filtering as recommended by the octopus documentation. We plan to integrate the octopus and deepvariant calls in the future.

Future Development

Development and research is underway so that it will:

Add a high-quality set of SV/CNVs
- Manta + SVchannels and duphold filtering
Add some prioritization of variants
- For example, lower priority to variants filtered in gnomAD
Integrate SV/CNV calls with the snp/indels to find, for example compound heterozygotes with a snp:SV pair.
Evaluate use of octopus to find large indels (and/or SNPs and indels).
Use GTex + phenotypes to further prioritize variants in a family and phenotype-specific way, such that, for example variants in genes that are not expressed in relevant tissues are down-weighted.
Provide a graphical-user-interface so that sorting, filtering, note-taking, sharing is simplified

Software Used

DeepVariant Variant Calling with Deep Learning. https://doi.org/10.1038/nbt.4235
GLNexus Joint variant calling. http://dx.doi.org/10.1101/343970
octopus haplotype-based mutation caller. https://doi.org/10.1038/s41587-021-00861-3
bcftools BCF/VCF manipulation. https://doi.org/10.1093/gigascience/giab008
bcftools csq variant consequence annotation. https://doi.org/10.1093/bioinformatics/btx100
htslib C libary for genomics data. https://doi.org/10.1093/gigascience/giab007
slivar variant filtering and annotation. https://doi.org/10.1101/2020.08.13.249532
igv.js. javascript genomics viewer. https://doi.org/10.1101/2020.05.03.075499
nextflow scientific workflows. https://doi.org/10.1038/nbt.3820
manta structural variant caller. https://doi.org/10.1093/bioinformatics/btv710
dysgu structural variant caller. https://doi.org/10.1101/2021.05.28.446147
paragraph structural variant genotyper. https://doi.org/10.1186/s13059-019-1909-7
jasmine structural variant merging. https://doi.org/10.1101/2021.05.27.445886
duphold structural variant depth annotation. https://doi.org/10.1093/gigascience/giz040
snpEff variant consequence annotation. https://doi.org/10.4161/fly.19695
svpack structural variant annotation.

rare-disease-wf
rare-disease-wf copied to clipboard

Metadata

Quickstart

Output

Octopus

Future Development

Software Used

← Metadata

Owner

Metadata

rare-disease-wf rare-disease-wf copied to clipboard

Metadata

Quickstart

Output

Octopus

Future Development

Software Used

← Metadata

Owner

Metadata

rare-disease-wf
rare-disease-wf copied to clipboard