Roary icon indicating copy to clipboard operation
Roary copied to clipboard

Core genome alignment failure

Open Steven-Kemp opened this issue 3 years ago • 1 comments

Hi @tseemann, I see you posting often on this github, so thought I'd ask!

I'm having some issues getting a core genome alignment of around 300 full-length E.coli sequences w/ ROARY.

The program runs as it should and outputs all of the expected files, however I often get issues with the core_genome alignment being blank following the warning:

--------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet

The summary statistics show

Core genes (99% <= strains <= 100%) 0 Soft core genes (95% <= strains < 99%) 69 Shell genes (15% <= strains < 95%) 11410 Cloud genes (0% <= strains < 15%) 99008 Total genes (0% <= strains <= 100%) 110487

I've checked the gene_presence_absence.Rtab and it looks ok to me, and I can see no obvious contamination.

Could you speculate what the issue may be?

Best wishes, Steve

Steven-Kemp avatar Aug 24 '21 08:08 Steven-Kemp

Small update, I reran this after much more carefully QC'ing the files.

Now:

Core genes (99% <= strains <= 100%) 126 Soft core genes (95% <= strains < 99%) 231 Shell genes (15% <= strains < 95%) 3745 Cloud genes (0% <= strains < 15%) 16947 Total genes (0% <= strains <= 100%) 21049

But still get the error: --------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet

Steven-Kemp avatar Aug 31 '21 14:08 Steven-Kemp