oases Oases output depends on the order of the sequences to assemble in the input

Oases output depends on the order of the sequences to assemble in the input

Open drosofff opened this issue 9 years ago • 0 comments

Progress as been made since issue #8 :

A/ Shuffling the order of the sequences in the input file clearly give rise to different Oases outputs, which is a bit problematic for automated workflows.

Within Galaxy (https://mississippi.snv.jussieu.fr/u/drosofff/h/oases-order-test), I shuffled the input fasta file (dataset 17) 20 times to give a collection of 20 shuffled fasta files with exactly the same sequences but in different orders (dataset collection 45). Then I run Oases on #45 this collection with kmers between 13 and 35, to give the output collection 87. It is easy to browse this collection and to see that the number of produced contigs varies

B/ if you run Oases on the same input collection but this time using kmers between 11 and 35, you get again variability in the output (dataset collection 66), but in addition some inputs (depending on the shuffling) put Oases in error. Your can see the error by clicking on the info button (i) in the red datasets and then looking at the stderr and stdout links.

This issue is annoying because aligners such as bowtie tends to return aligned sequences in slightly different order depending on the number of threads/processors recruited for the job.

additional info: our msp_oases wrappers uses the last velvet commit (https://github.com/dzerbino/velvet/commit/9adf09f7ded7fedaf6b0e5e4edf9f46602e263d3) and the oases commit https://github.com/dzerbino/oases/commit/7a32460a60929b510037952ae815bb6e29b68123 (which I recognized should be upgraded...)

Feb 13 '16 15:02 drosofff

oases oases copied to clipboard

Oases output depends on the order of the sequences to assemble in the input

oases
oases copied to clipboard