vt
vt copied to clipboard
vt cannot retrieve sequences from my reference sequence file
Deleted the reference index file. Downloaded reference genome from the UCSC database. But still getting the same error. vt normalize HF.final.vcf.gz -r ref/hg19.fasta -o normalizeHF.final.vcf.gz
[variant_manip.cpp:72 is_not_ref_consistent] failure to extract base from fasta file: AL123456:24697-24715 FAQ: http://genome.sph.umich.edu/wiki/Vt#1._vt_cannot_retrieve_sequences_from_my_reference_sequence_file
AL123456 is not a chromosome in hg19.fasta I believe. You should use the reference that was used to generate your VCF file.
After downloading the reference genome from ncbi genome still getting the same error
wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_genomic.fna.gz
vt normalize HF.final.vcf.gz -r ref/GRCh37_latest_genomic.fna.gz -o normalizeHF.final.vcf.gz
[variant_manip.cpp:72 is_not_ref_consistent] failure to extract base from fasta file: 1:866510-866510 FAQ: http://genome.sph.umich.edu/wiki/Vt#1._vt_cannot_retrieve_sequences_from_my_reference_sequence_file
@alrafaykhan
GRCh37_latest_genomic.fna.gz has its chromosomes not named consistently with the chromosomes in your VCF file. They are prepended with NC_, NW_ and NT_.
You need to use a reference file where the sequences have a name consistent with what is found in your VCF file.
@atks I had a similar issue. Below is is my error [variant_manip.cpp:96 is_not_ref_consistent] reference bases not consistent: chr10:4868478-4868486 TGCGGGGCG(REF) vs Tgcggggcg(FASTA)
When I used the -n flag vt skipped several variants