Roary
Roary copied to clipboard
core gene alignment looks like it aligns paralogs by accident
Hi, We are analyzing Mycobacteria isolated from cystic fibrosis patients and two patients are known to have infected each other (twins). For these two genomes isolated from these two patients we observe ~760 SNP differences (snpdist run on core_gene_alignment). If I visualize the position of the core genome alignment variation (custom script in R), I see strong vertical lines. Upon close inspection I found that is was very different sequences that were forced to align (with similarity in the beginning of the gene). Outside this single gene, the two genomes are 99.99% identical (11 snps in ~4 megabases).
So it is probably paralogs that are treated as orthologs by accident(?)
I checked more samples and this pattern of variation in the core_gene_alignment is not unusual.
Is there a way for me to force roary to be very stringent when detecting orthologs? I have a feeling that it is a matter of command line options to roary.
I have attached a plot that show many of these "lines" from a roary run of 29 gff files (generated with prokka). It also shows a really weird pattern from codon positions (defined as alignment position %% 3).
useful paramaters are
-i minimum percentage identity for blastp [95]
-iv STR Change the MCL inflation value [1.5]
Increasing inflation will increase granularity, that is it will produce smaller clusters.
Thanks a lot for quick reply - will try that.
I developed a simple visualization of the entire alignment to check this.
The SNPs/INDEL columns are scaled with density so the "peaks" are in regions of high density, so it is easy to see some regions of missing paralogue splits.
I don't know if this would of general interest to users of roary(?)
Hi @pallevillesen that will be great if you could share your script to generate those nice alignment visualisation.
Hi @fconstancias . I rarely use git (sorry) but actually went all the way to set it up.
https://github.com/pallevillesen/plot_roary