parsnp Parsnp -d sometimes fails to recruit random files

Todd,

If I create a folder called 'fasta' with 20 small identical fasta files and run "parsnp -r '!' -d fasta" and run it, often my resulting tree only has 19 genomes in it, and other times 20. The 'missing' genome is somewhat random, and is missing from the RECRUITED GENOMES list. By running the command over and over again I get different results.

This bug has us confused, so I'm thinking it might be a non-deterministic parallel race condition maybe? Even though I'm using default -p 1.

Torsten

Jan 06 '15 05:01 tseemann

Torsten,

Thanks for pointing this out. Any chance you could share these 20 small files so I can take a closer look? It sounds like an issue in the MUMi distribution cutoff calc, not a race condition, but will need to debug. In the meantime, adding a -c to the command-line parameters should serve as a work-around (forcing all 20 genomes to be always included).

Jan 06 '15 10:01 treangen

Hi! I have the same problem, it seems to be random as Torsten points out, and it doesn't always excludes just one genome but sometimes more as well. The fasta files have been generated the same way with the same formats and fasta headers in the format "B128_contig1". There is no error message or anything so I don't know how to describe it more in detail... Best wishes, Kaisa

Feb 03 '16 14:02 thorellk

hi Kaisa,

Thanks for pointing this out; it is a known issue & will be fixed/addressed in the new release (appearing shortly). In the meantime, please use the (-c) option as a workaround.

best,

Todd

Feb 05 '16 21:02 treangen

Sorry I never sent you any files. I look forward tot he new version.

Feb 13 '16 09:02 tseemann

Hello Started using recently parsnp and also had the exact same problem either using a gbk as reference or a fasta file. Any news for the new version addressing this soon ? Cheers JAC

May 11 '16 10:05 jacarrico

Thanks João,

I plan to post a new release that will address this issue. In the meantime, you could use '-c' as a workaround. Will keep you posted.

May 13 '16 15:05 treangen

Hello,

I am getting the same exact issues even when using the -c flag. Is there a new release that we should download to work around this issue?

Thank you!

Jan 27 '17 19:01 abenaa07

What is the status of this issue?

Apr 16 '19 21:04 innovate-invent

I've tested a couple directories multiple times and haven't been able to replicate the issue. Can someone please provide an example set?

Jun 20 '20 18:06 bkille

I am copy/pasting this from our internal issue tracker as I believe it may help with this issue:

Bug was caused by Parsnp when one .fna file name was a number of repetitions of another. Example: 7.fna and 77.fna or 1.fna and 111.fna. This caused some runs of Parsnp to return a newick that was missing one or more of the genomes, leading to the visualization issues. When there were only two genomes and this issue occurred, Parsnp would fail because it could not recognise at least two files.

Issue was solved by making Parsnp retry when these errors occur. This works because the issue does not occur every time as there is a degree of randomness to Parsnps results.

Jan 17 '21 10:01 innovate-invent

@innovate-invent thanks for forwarding this to us! I was unable to replicate this issue with the following approach:

Ran Parsnp against two files, 1.fna and 11.fna (as well as 1 vs 111, 11 vs 111 etc)
Ran Parsnp against 1.fna, 11.fna, 111.fna and 111.fna
Ran Parsnp against many files, which included 1.fna, 11.fna, 111.fna and 111.fna

Could you by any chance attach the output from one of the relevant runs with the --verbose flag? Particularly the runs that fail would be the most helpful. Are you selecting the reference at random? That would be my first guess for files non-deterministically being excluded.

Thanks,

Bryce

Jan 21 '21 03:01 bkille

Yes, random reference selection was used. I believe this test would have to be run repeatedly as the error does not always occur. We modified our pipeline to avoid this issue. I'll have to find some time to set up another test bench for it.

Jan 23 '21 19:01 innovate-invent

parsnp parsnp copied to clipboard

Parsnp -d sometimes fails to recruit random files

parsnp
parsnp copied to clipboard