bcftools
bcftools copied to clipboard
unresolved sites after using bcftools +fixref
Hi, I need to fix the REF allele for my .vcf file per Sanger Imputataion Server's requirement. My .vcf happens to contain dbSNP reference identificators, so I followed the command here http://samtools.github.io/bcftools/howtos/plugin.fixref.html. However, the bcftools gives me the following message:
[yh362@cbsubscb10 MS_ALL]$ bcftools +fixref MS_ALL_illumina_QC2.vcf -Ov -o fixref.vcf -- -d -f /bscb/data/human_reference/GRCh37/human_g1k_v37/human_g1k_v37.fasta -i dbsnp_138.b37.vcf.gz
Warning: corrected position(s) results in unsorted VCF, for example 2:95398090 comes after 2:96007302
The standard unix `sort` or `vcf-sort` from vcftools can be used to fix the order.
# SC, guessed strand convention
SC TOP-compatible 0
SC BOT-compatible 0
# ST, substitution types
ST A>C 38564 8.4%
ST A>G 172187 37.6%
ST A>T 0 0.0%
ST C>A 43618 9.5%
ST C>G 0 0.0%
ST C>T 0 0.0%
ST G>A 203611 44.5%
ST G>C 0 0.0%
ST G>T 0 0.0%
ST T>A 0 0.0%
ST T>C 0 0.0%
ST T>G 0 0.0%
# NS, Number of sites:
NS total 457980
NS ref match 158794 34.7%
NS ref mismatch 299186 65.3%
NS flipped 0 0.0%
NS swapped 69948 15.3%
NS flip+swap 0 0.0%
NS unresolved 229317 50.1%
NS fixed pos 5 0.0%
NS skipped 0
NS non-ACGT 0
NS non-SNP 0
NS non-biallelic 0
So it appears that half of the sites remain unresolved. What might be the reason for this? Thanks!
It is probably because there are many missing IDs or non-matching IDs in the VCF.
@hyl317: are you using affy genomewide5 array? I'm getting similar numbers