parsnp icon indicating copy to clipboard operation
parsnp copied to clipboard

Parsnp -d sometimes fails to recruit random files

Open tseemann opened this issue 10 years ago • 12 comments

Todd,

If I create a folder called 'fasta' with 20 small identical fasta files and run "parsnp -r '!' -d fasta" and run it, often my resulting tree only has 19 genomes in it, and other times 20. The 'missing' genome is somewhat random, and is missing from the RECRUITED GENOMES list. By running the command over and over again I get different results.

This bug has us confused, so I'm thinking it might be a non-deterministic parallel race condition maybe? Even though I'm using default -p 1.

Torsten

tseemann avatar Jan 06 '15 05:01 tseemann

Torsten,

Thanks for pointing this out. Any chance you could share these 20 small files so I can take a closer look? It sounds like an issue in the MUMi distribution cutoff calc, not a race condition, but will need to debug. In the meantime, adding a -c to the command-line parameters should serve as a work-around (forcing all 20 genomes to be always included).

treangen avatar Jan 06 '15 10:01 treangen

Hi! I have the same problem, it seems to be random as Torsten points out, and it doesn't always excludes just one genome but sometimes more as well. The fasta files have been generated the same way with the same formats and fasta headers in the format "B128_contig1". There is no error message or anything so I don't know how to describe it more in detail... Best wishes, Kaisa

thorellk avatar Feb 03 '16 14:02 thorellk

hi Kaisa,

Thanks for pointing this out; it is a known issue & will be fixed/addressed in the new release (appearing shortly). In the meantime, please use the (-c) option as a workaround.

best,

Todd

treangen avatar Feb 05 '16 21:02 treangen

Sorry I never sent you any files. I look forward tot he new version.

tseemann avatar Feb 13 '16 09:02 tseemann

Hello Started using recently parsnp and also had the exact same problem either using a gbk as reference or a fasta file. Any news for the new version addressing this soon ? Cheers JAC

jacarrico avatar May 11 '16 10:05 jacarrico

Thanks João,

I plan to post a new release that will address this issue. In the meantime, you could use '-c' as a workaround. Will keep you posted.

treangen avatar May 13 '16 15:05 treangen

Hello,

I am getting the same exact issues even when using the -c flag. Is there a new release that we should download to work around this issue?

Thank you!

abenaa07 avatar Jan 27 '17 19:01 abenaa07

What is the status of this issue?

innovate-invent avatar Apr 16 '19 21:04 innovate-invent

I've tested a couple directories multiple times and haven't been able to replicate the issue. Can someone please provide an example set?

bkille avatar Jun 20 '20 18:06 bkille

I am copy/pasting this from our internal issue tracker as I believe it may help with this issue:

Bug was caused by Parsnp when one .fna file name was a number of repetitions of another. Example: 7.fna and 77.fna or 1.fna and 111.fna. This caused some runs of Parsnp to return a newick that was missing one or more of the genomes, leading to the visualization issues. When there were only two genomes and this issue occurred, Parsnp would fail because it could not recognise at least two files.

Issue was solved by making Parsnp retry when these errors occur. This works because the issue does not occur every time as there is a degree of randomness to Parsnps results.

innovate-invent avatar Jan 17 '21 10:01 innovate-invent

@innovate-invent thanks for forwarding this to us! I was unable to replicate this issue with the following approach:

  • Ran Parsnp against two files, 1.fna and 11.fna (as well as 1 vs 111, 11 vs 111 etc)
  • Ran Parsnp against 1.fna, 11.fna, 111.fna and 111.fna
  • Ran Parsnp against many files, which included 1.fna, 11.fna, 111.fna and 111.fna

Could you by any chance attach the output from one of the relevant runs with the --verbose flag? Particularly the runs that fail would be the most helpful. Are you selecting the reference at random? That would be my first guess for files non-deterministically being excluded.

Thanks,

Bryce

bkille avatar Jan 21 '21 03:01 bkille

Yes, random reference selection was used. I believe this test would have to be run repeatedly as the error does not always occur. We modified our pipeline to avoid this issue. I'll have to find some time to set up another test bench for it.

innovate-invent avatar Jan 23 '21 19:01 innovate-invent