snippy icon indicating copy to clipboard operation
snippy copied to clipboard

generate a variant table for dnds calculation

Open smb20200615 opened this issue 2 years ago • 4 comments

Hello,

I was wondering if there is an easy way to generate a genotype by isolate table for subsequent dnds calculation. I previously used the core.tab file from snippy-core but I think that omits variants if it is not found in all genomes.

Many thanks for your guidance,

smb20200615 avatar Dec 09 '21 16:12 smb20200615

Hi @smb20200615,

you can use bcftools merge to merge the vcf files of your isolates:

bcftools -m none -0 -O z <iso1/snps.vcf><iso2/snps.vcf>... > merged.vcf.gz

You can then either use the merged vcf file as input for your tool or transform it in tabular format using packages such as vcfR (in R) or scikit-allel (in python)

stefanogg avatar Dec 20 '21 22:12 stefanogg

@stefanogg thank you so much for your guidance. I had tried that but the issue with merging the variants is that we won't get the null calls - the approach assumes that areas with no variant are the same as the reference which will not be try if we have an N at the site. Can snippy provide info on these null sites?

smb20200615 avatar Dec 28 '21 02:12 smb20200615

No, snippy-core doesn't provide details of the null-sites, because it uses a strict definition of core genome (0% gaps). You can use bcftools merge without the option -0 (--missing-to-ref) so that N are coded as missing instead of 0/0.

You can also use goalign or trimal to gather information on N-sites and then filter the vcf files.

stefanogg avatar Jan 05 '22 04:01 stefanogg

How would bcftools know what is missing/ambiguous (N) if Snippy does not record such positions in its VCF file in the first place? Aren't such sites purposefully left out (i.e. --minfrac parameter) of the final set of SNP calls due to their low confidence?

cizydorczyk avatar Feb 11 '22 22:02 cizydorczyk