snippy
snippy copied to clipboard
Different pairwise SNP distances from snippy-core depending on inputs
Hi,
I am getting strange results from snippy-core: I ran contigs from 100 isolates using snippy-multi (therefore all against the same reference) and then generated a distance matrix from the core.aln file using snp-dists. So far so good. However, I then wanted to construct a phylogenomic tree, so I used snippy on some extra reference strains and outgroups individually, with the same reference as before. I then ran everything through snippy-core again and found that it gave me different pairwise SNP differences to the previous run, so I played around with some other combinations.
As an example:
- in my original run, there were 124 SNPs between Sample021 and Sample043.
- when I added the reference strains and outgroups there were no SNPs between the same two strains.
- when I repeated 1) but changed the --ref from 'Sample001/ref.fa' to my original ref.gbk file, the SNP distance between Sample021 and Sample043 was 25.
- when I only used four of my isolates, including 021 and 043, the number of SNPs was 119.
I know that running different groups of samples will change what is defined as 'core' but why do I get different results by using a subset of the original group and when using the original ref.gbk as opposed to snippy's .fa version of that file?
@nthomson50 running snippy with different sets of samples with the same reference will very likely change the core, and therefore the number of SNPs (as you suggest).
ref.fa
is just a conversion of the ref.gbk to FASTA. So, I am not sure how number 3 could happen without a change in the samples. I think we would need more information to troubleshoot. Can you confirm that the sequence data in Sample001/ref.fa
is in fact the same as the sequence in ref.gbk
?
As to number 2, I suspect one or more of your outgroup samples are too distant --- which would mean the core genome is effectively zero.