Roary
Roary copied to clipboard
Roary grouping nearby dissimilar genes into the same group
Hi, I am trying to build the pan genome for ~200 environmental E. coli isolates. MG1655 was included in the data set for reference. I found Roary has a tendency to cluster near by genes into the same group. This is probably due to the use of syntany information. But the problem is a lot of the time those gene are very dissimilar to each other.
You can see here in the gene_presence_absence.csv file, for MG1655(U00096.3), JKMANJED_00365 and JKMANJED_00366 are put in the tauC group.
But the gff file from prokka showes that JKMANJED_00366 is annotated as tauD.
And their sequence alignment is very different from each other. The minimum percentage identity for blastp was set at 90% for Roary, so it should be able to split these two entries.
In addition, the tauC/tauD pair are clearly separated in most of other strains (only showing 2 of them here).
The tauD gene in MG1655(JKMANJED_00366) and ERR3062275 (DIFMKJME_03605) align very well with 96.8% identity.
It is also interesting that if only use 10 samples + MG1655, tauD and tauC are separated by the software. Is there any way to fix this issue? Or any ideas on what might cause this problem? I really appreciate the help.
I have met the same problem T_T Is it possible to know what the cause of this is.