bcftools icon indicating copy to clipboard operation
bcftools copied to clipboard

Should changing the haplotype setting result in a different number of sequences in the final consensus?

Open JamieAlfieri opened this issue 2 years ago • 1 comments

I ran similar commands with the same data, only changing the -H setting.

bcf consensus -H 1 -f reference.fa subject.vcf.gz > consensus_H1.fasta bcf consensus -H 2 -f reference.fa subject.vcf.gz > consensus_H2.fasta bcf consensus -H A -f reference.fa subject.vcf.gz > consensus_HA.fasta

subject_H1.fasta contains 8063 sequences (same as the reference) subject_H2.fasta contains 1946 sequences. subject_HA.fasta contains 1946 sequences.

I thought the consensus sequence will contain the reference allele for homozygous genotypes (if there is not an alternate allele (-H A) or if there is not a second allele (-H 2)). But if this were the case, the number of sequences should be equal.

JamieAlfieri avatar Feb 17 '22 17:02 JamieAlfieri

Hmm, I don't think it should and can't think of a good reason where the bug would come from. Are there many haploid genotypes in the VCF? Any chance you could provide a small test case to reproduce the problem?

Importantly, what version of bcftools are you using?

pd3 avatar Feb 18 '22 08:02 pd3