srst2 icon indicating copy to clipboard operation
srst2 copied to clipboard

Different MLST result when re-running SRST2 after adding in new alleles

Open swlong opened this issue 8 years ago • 3 comments

Howdy all,

So running into an odd issue. I am running a few K. pneumoniae samples against an MLST database using 0.2.0, and on first run, it generated the following best match:

image

It says the rpoB_135 allele hit has 1 SNP, and these are the coverage stats:

image

So I saved my new consensus fastas, and then added them back to my MLST database to allow for calling against these "new" alleles. When rerunning the same FASTQ file, I then got this result:

258 3 3 1 1 1 1 79

With these stats:

42.48 0.171428571429

Any ideas? The alleles for 258 should've been present in the earlier database, so not sure why I would get a 135* call on first run, with overall less depth of coverage (~22x) vs 42x on the repeat, with a clean hit against all ST258 alleles.

Best, S. Wesley Long

swlong avatar Nov 15 '16 22:11 swlong

Hmmm, that's a interesting one. Our first hypothesis that is your reads are a mix of different genomes, as this has been the cause of weird SRST2 results in the past. So perhaps run some QC to see if you have a mixed sample?

It would be really informative if you could run SRST2 with --save_scores for your two databases (before and after the new allele was added). Then we could take a look at the .scores files. In particular, I'm curious how allele 1 scored in your first run and how allele 135 scored in your second run.

rrwick avatar Nov 17 '16 02:11 rrwick

Mixed sample is most likely. These are all samples from clinical specimens, so not unusual to have a "community" of organisms.

Fairly busy at the moment but I will try to get the scores run on this particular example and let you know what they say.

swlong avatar Nov 17 '16 15:11 swlong

(Accidental close, apologies)

swlong avatar Nov 17 '16 15:11 swlong