Sniffles
Sniffles copied to clipboard
snfs2vcf is different from merged vcf by bcftools
Hello,
I am dealing with 12 really big aligned bam files. So I had to divided them by 11 chromosomes, unplaced scaffolds and unmapped reads. So For one sample, there are 13 .snf
files (11 chromosomes+1 unplaced scaffolds + 1 unmapped reads) generated by sniffles.
Then I picked 2 samples to test sniffles' population calling function, in other words, I used 26 .snf
files to generate one .vcf
file.
At the same time, I also made a compare. First, one .vcf
per sample was generated by sniffles using 13 .snf
files. Second, Merging two vcf
files of picked 2 samples into one vcf
, utilizing bcftools
.
Next, I use bcftools view
to check if there are diferent between them. The answer is yes!
The number of SVs is 1793617 vs 2689114. The high number is generated by bcftools.
Why is there such a big difference?
Many thanks, Maxine
I am honestly not sure I follow all the split bam logic. One reason is for sure that bcftools doesn't allow for differences in the start/ stop coordinates. Wheras this is good practice and we implemented that.. Thus you will probably see that the bcftools results are all rare variants across the samples and in the sniffles2 merge output that more SV are supported across samples. Hope that helps Fritz
@fritzsedlazeck Hi, Fritz. Thank you so much for the response! Can I absorb your explanation in the way that, the bcftools merge would give me more variants including some rare variants that only exist in one or few individuals, and on the other hand, sniffles2 merge would tend to give me common variants across samples? If so, more varients mean more information, I guess? Maxine
bcftools merge expects exact matches for the coordinates, while sniffles/SURVIVOR/jasmine (tools specific for SVs) allow some wobble around those breakpoints.
say that sample 1 has a 500bp deletion at chr1:23456-23956, and sample 2 has a 499bp deletion at chr1:23457-23956, then bcftools will think of that as two variants while it is highly likely that those two deletions are actually the same variant.