SNPGenie icon indicating copy to clipboard operation
SNPGenie copied to clipboard

Need help to determine method for inference of convergent evolution

Open qianxuans opened this issue 2 years ago • 1 comments

Hi, I am doing an analysis to infer the convergent evolution of bacteria in a longitudinal study. Several clones of the same bacterium are studied to determine whether they have within-host convergent evolution. For each clone, samples were collected from different time points. It is kind of similar idea to this research #62 . If I want to analyze whether there is convergent evolution that occurs among several clones, what is the best method I should use?

  1. Should I call SNP and use the original SNPGenie or should I use the within-group with the msa? If I use msa instead of vcf, would it be overkill like the situation described here? #44
  2. Will VCFGenie be helpful in this case?

Thank you so much for your help!

qianxuans avatar Jan 30 '23 17:01 qianxuans

Greetings, @qianxuans !

To me, the question 'is there convergent evolution' could simply mean, 'does the same mutation arise independently in different lineages'? Alternatively, it could mean 'does the same mutation arise independently and also increase in frequency to >50% in different lineages'? In the first case it might be enough to determine whether the variant is present in multiple clones. In the second, it might be even simpler, i.e., whether the same variant is present in the consensus sequence of multiple clones at the end of the study. If you do find such a variant, you'd probably want to deep sequence the original/source sample to see whether the variant was already present at low levels, or whether it arose de novo in multiple lineages.

I'm not sure what to advise, because the best approach will depend on the specific question you have. VCFgenie is up and running, and would be useful for quality filtering VCF files to help determine which variants are real (not sequencing error). SNPGenie can use those VCF files to estimate natural selection, if that's part of your goal. If you chose the MSA version, you'd probably be comparing consensus sequences from different time points, which is a different approach than within-timepoint variant.

Let me know if that helps! Chase

singing-scientist avatar Feb 03 '23 16:02 singing-scientist