vt icon indicating copy to clipboard operation
vt copied to clipboard

vt cannot retrieve sequences from my reference sequence file

Open alrafaykhan opened this issue 6 years ago • 4 comments

Deleted the reference index file. Downloaded reference genome from the UCSC database. But still getting the same error. vt normalize HF.final.vcf.gz -r ref/hg19.fasta -o normalizeHF.final.vcf.gz

[variant_manip.cpp:72 is_not_ref_consistent] failure to extract base from fasta file: AL123456:24697-24715 FAQ: http://genome.sph.umich.edu/wiki/Vt#1._vt_cannot_retrieve_sequences_from_my_reference_sequence_file

alrafaykhan avatar Mar 21 '18 16:03 alrafaykhan

AL123456 is not a chromosome in hg19.fasta I believe. You should use the reference that was used to generate your VCF file.

atks avatar Mar 21 '18 18:03 atks

After downloading the reference genome from ncbi genome still getting the same error wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh37_latest/refseq_identifiers/GRCh37_latest_genomic.fna.gz

vt normalize HF.final.vcf.gz -r ref/GRCh37_latest_genomic.fna.gz -o normalizeHF.final.vcf.gz

[variant_manip.cpp:72 is_not_ref_consistent] failure to extract base from fasta file: 1:866510-866510 FAQ: http://genome.sph.umich.edu/wiki/Vt#1._vt_cannot_retrieve_sequences_from_my_reference_sequence_file

alrafaykhan avatar Mar 22 '18 15:03 alrafaykhan

@alrafaykhan

GRCh37_latest_genomic.fna.gz has its chromosomes not named consistently with the chromosomes in your VCF file. They are prepended with NC_, NW_ and NT_.

You need to use a reference file where the sequences have a name consistent with what is found in your VCF file.

atks avatar Mar 23 '18 04:03 atks

@atks I had a similar issue. Below is is my error [variant_manip.cpp:96 is_not_ref_consistent] reference bases not consistent: chr10:4868478-4868486 TGCGGGGCG(REF) vs Tgcggggcg(FASTA)

When I used the -n flag vt skipped several variants

vappiah avatar Jun 11 '20 09:06 vappiah