autopvs1
autopvs1 copied to clipboard
An automatic classification tool for PVS1 interpretation of null variants
AutoPVS1
An automatic classification tool for PVS1 interpretation of null variants.
A web version for AutoPVS1 is also provided: http://autopvs1.genetics.bgi.com
:art: AutoPVS1 is now compatible with hg19/GRCh37 and hg38/GRCh38.
PREREQUISITE
1. Variant Effect Predictor (VEP)
AutoPVS1 use VEP to determine the effect of variants (SNVs, insertions, deletions, CNVs) on genes, transcripts, and protein sequence. To get HGVS name for the variant, indexed_vep_cache (homo_sapiens_refseq 104_GRCh37 and 104_GRCh38) and fasta files are required.
VEP Installation
git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep
git pull
git checkout release/104
perl INSTALL.pl
VEP cache and faste files
VEP cache and faste files can be automatically downloaded and configured using INSTALL.pl. You can also download and set up them manually:
r=104
FTP='ftp://ftp.ensembl.org/pub/'
# indexed vep cache
cd $HOME/.vep
wget $FTP/release-${r}/variation/indexed_vep_cache/homo_sapiens_refseq_vep_${r}_GRCh38.tar.gz
wget $FTP/release-${r}/variation/indexed_vep_cache/homo_sapiens_refseq_vep_${r}_GRCh37.tar.gz
tar xzf homo_sapiens_vep_${r}_GRCh37.tar.gz
tar xzf homo_sapiens_vep_${r}_GRCh38.tar.gz
# fasta
cd $HOME/.vep/homo_sapiens_refseq/${r}_GRCh37/
wget $FTP/grch37/current/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz
tar xzf Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz
cd $HOME/.vep/homo_sapiens_refseq/${r}_GRCh38/
wget $FTP/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
tar xzf Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
2. pyfaidx
Samtools provides a function “faidx” (FAsta InDeX), which creates a small flat index file “.fai” allowing for fast random access to any subsequence in the indexed FASTA file, while loading a minimal amount of the file in to memory.
pyfaidx module implements pure Python classes for indexing, retrieval, and in-place modification of FASTA files using a samtools compatible index.
3. maxentpy
maxentpy is a python wrapper for MaxEntScan to calculate splice site strength. It contains two functions. score5 is adapt from MaxEntScan::score5ss to score 5' splice sites. score3 is adapt from MaxEntScan::score3ss to score 3' splice sites.
maxentpy is already included in the autopvs1.
4. pyhgvs
pyhgvs provides a simple Python API for parsing, formatting, and normalizing HGVS names. But it only supports python2, I modified it to support python3 and added some other features. It is also included in the autopvs1.
5. Configuration
autopvs1/config.ini
[DEFAULT]
vep_cache = $HOME/.vep
pvs1levels = data/PVS1.level
gene_alias = data/hgnc.symbol.previous.tsv
gene_trans = data/clinvar_trans_stats.tsv
[HG19]
genome = data/hg19.fa
transcript = data/ncbiRefSeq_hg19.gpe
domain = data/functional_domains_hg19.bed
hotspot = data/mutational_hotspots_hg19.bed
curated_region = data/expert_curated_domains_hg19.bed
exon_lof_popmax = data/exon_lof_popmax_hg19.bed
pathogenic_site = data/clinvar_pathogenic_GRCh37.vcf
[HG38]
genome = data/hg38.fa
transcript = data/ncbiRefSeq_hg38.gpe
domain = data/functional_domains_hg38.bed
hotspot = data/mutational_hotspots_hg38.bed
curated_region = data/expert_curated_domains_hg38.bed
exon_lof_popmax = data/exon_lof_popmax_hg38.bed
pathogenic_site = data/clinvar_pathogenic_GRCh38.vcf
You can specify the vep cache directory to use, default is $HOME/.vep/
hg19.fa is downloaded from UCSC hg19.fa.gz and indexed with samtools faidx
hg38.fa is downloaded from NCBI GRCh38_no_alt_analysis_set.fna.gz and indexed with samtools faidx
Note: the chromesome name in fasta files should have chr
prefix
USAGE
from autopvs1 import AutoPVS1
demo = AutoPVS1('13-113803407-G-A', 'hg19')
demo2 = AutoPVS1('13-113149093-G-A', 'hg38')
if demo.islof:
print(demo.hgvs_c, demo.hgvs_p, demo.consequence, demo.pvs1.criterion,
demo.pvs1.strength_raw, demo.pvs1.strength)
# GRCh37 and GRCh38 is also supported
demo = AutoPVS1('13-113803407-G-A', 'GRCh37')
demo2 = AutoPVS1('13-113149093-G-A', 'GRCh38')
FAQ
Please see https://autopvs1.genetics.bgi.com/faq/
TERM OF USE
Users may freely use the AutoPVS1 for non-commercial purposes as long as they properly cite it.
This resource is intended for research purposes only. For clinical or medical use, please consult professionals.
:memo:citation: Jiale Xiang, Jiguang Peng, Samantha Baxter, Zhiyu Peng. (2020). AutoPVS1: An automatic classification tool for PVS1 interpretation of null variants. Hum Mutat 41, 1488-1498. (Editor's choice and cover article)