Roary icon indicating copy to clipboard operation
Roary copied to clipboard

core gene alignment looks like it aligns paralogs by accident

Open pallevillesen opened this issue 4 years ago • 4 comments

Hi, We are analyzing Mycobacteria isolated from cystic fibrosis patients and two patients are known to have infected each other (twins). For these two genomes isolated from these two patients we observe ~760 SNP differences (snpdist run on core_gene_alignment). If I visualize the position of the core genome alignment variation (custom script in R), I see strong vertical lines. Upon close inspection I found that is was very different sequences that were forced to align (with similarity in the beginning of the gene). Outside this single gene, the two genomes are 99.99% identical (11 snps in ~4 megabases).

So it is probably paralogs that are treated as orthologs by accident(?)

I checked more samples and this pattern of variation in the core_gene_alignment is not unusual.

Is there a way for me to force roary to be very stringent when detecting orthologs? I have a feeling that it is a matter of command line options to roary.

I have attached a plot that show many of these "lines" from a roary run of 29 gff files (generated with prokka). It also shows a really weird pattern from codon positions (defined as alignment position %% 3).

figure alignment alleles

pallevillesen avatar Oct 23 '19 13:10 pallevillesen

useful paramaters are

         -i        minimum percentage identity for blastp [95]
         -iv STR   Change the MCL inflation value [1.5]

Increasing inflation will increase granularity, that is it will produce smaller clusters.

tseemann avatar Oct 24 '19 02:10 tseemann

Thanks a lot for quick reply - will try that.

I developed a simple visualization of the entire alignment to check this.

The SNPs/INDEL columns are scaled with density so the "peaks" are in regions of high density, so it is easy to see some regions of missing paralogue splits.

I don't know if this would of general interest to users of roary(?)

figure alignment alleles

pallevillesen avatar Oct 24 '19 09:10 pallevillesen

Hi @pallevillesen that will be great if you could share your script to generate those nice alignment visualisation.

fconstancias avatar Sep 10 '20 10:09 fconstancias

Hi @fconstancias . I rarely use git (sorry) but actually went all the way to set it up.

https://github.com/pallevillesen/plot_roary

pallevillesen avatar Sep 17 '20 08:09 pallevillesen