flexidot icon indicating copy to clipboard operation
flexidot copied to clipboard

Flexidot computational time

Open paolo002 opened this issue 5 years ago • 8 comments

Hi Thanks a lot for developing this toolkit looks really amazing. However, may I know the computational time and threads needed to obtain the graphs? Because I am trying it but it seems to be taking long time to even perform a first calculation. Does it need to be run on a server? Best Regards PL

paolo002 avatar Nov 29 '18 13:11 paolo002

Hi again I have tried to run flexidot on a server but the time needed is still quite long. Is the tool not able to support parallelisation? Also, after some time I got some output as txt file but I cannot find the images of the plots... Regards PL

paolo002 avatar Dec 01 '18 11:12 paolo002

Hi @paolo002 I've worked with flexidot for a few times now and everything runs in reasonable time. So may I ask you what kind of data you would like to compare or plot?

crimBubble avatar Dec 11 '18 09:12 crimBubble

Hi @crimBubble

Thanks for your reply. At the moment I have downloaded from UCSC a nucleotide DNA sequence of a region which encompass several genes (the region is pretty large, it should be thousands of base pairs...). I would like to compare it to itself in order to find regions of repeats or inversions. In the past I wanted to do a pairwise comparison of 3 DNA sequences (which are shorter) At first the run was stuck and I could not get an output,then suddenly for some reason the tool started working and I got the output for the 3 sequences immediately, (I don't know why the run seemed to be stuck when I run it at the beginning and then it started working..not sure if that depends on the memory available at the time of the run).
Regarding the longer sequence I still did not get an output when I run on my laptop. When I run on a server, the run is complete but I can't see any output. Please, let me know your advice Thanks

paolo002 avatar Dec 11 '18 09:12 paolo002

Hi @paolo002,

thank you for giving FlexiDot a try. We are a bit unsure about where the problem actually resides. Do you get some kind of error message? FlexiDot should give a warning, if parameters are incorrect or files are missing. Maybe you can post the command-line output, so that we can see if there is something wrong. After the run is finished, you should expect "Thank you for using FlexiDot!" to be printed.

We regularly use FlexiDot on SMRT reads (up to 50-100 kb) with reasonable run times. Depending on the repetitiveness, it should take something on the minute scale. To just check the command-line/ tool performance, we recommend that you crop your sequence to something short, maybe 5000 bp, and test, if you get the expected output files. Usually, output files comprise text and image files. The dotplot image itself is the last one generated. Alternatively, you can try to analyse your sequences with a longer word size to rule out memory issues, e.g. -k 15 -S 2 (default -k 7 -S 0 ).

For your information, we are currently preparing the next FlexiDot release which clearly shortens the runtime for long, repetitive sequences. We are testing it at the moment with the most common commands, and if we do not run into any trouble, it should be online this week. In the future, we would like to parallelize FlexiDot. However, we did not yet find a satisfying library that works cross-platform.

In general, we would not recommend using FlexiDot for pseudochromosomes or other sequences in the Mbp scale, as it would simply take too long. Especially for small word sizes this might also raise memory issues. For long sequences, we recommend the use of longer word sizes, maybe with mismatches. There are other tools (such as dgenies), which perform better on super-long sequences.

Best wishes, Kathrin and Tony

molbio-dresden avatar Dec 11 '18 11:12 molbio-dresden

Hi Kathryn and Tony

I just tried it right now and I increased the word size value -k to 20 as you suggested and it worded. I got the output within few minutes. By the way, my sequence is 600kb so this kind of size should be fine right? Good to hear that you are realising a new version. This tool is very nice, especially the graphic of the output. Thanks Regards Paolo

paolo002 avatar Dec 11 '18 13:12 paolo002

Hi sorry to disturb you again, I was trying it with larger sequence, such as around 1MBp. It seems to be stuck again. Maybe there still some issue with the memory and longer sequences. If it is ready this week, I will try the new version too and see how it will perform compared to the old one. Thanks a lot Best Paolo

paolo002 avatar Dec 11 '18 14:12 paolo002

Hi @paolo002 and @crimBubble,

we just uploaded the new FlexiDot version. Maybe you want to try, how it performs on your data? It won't address the memory issue, but it will be faster.

Best,

Kathrin & Tony

molbio-dresden avatar Dec 14 '18 11:12 molbio-dresden

Hi, I'm enjoying using Flexidot, but I had similar issues at first in that it took a long time to run. Then I realized I'd missed the -t option: when I specify that the sequences are protein it runs much quicker (surprisingly it runs to completion even with the wrong sequence type). I guess it might be useful if Flexidot detects whether the -t option is correct for the sequences it reads, and throws an error if not.

Thanks for the nice tool! Alastair

skeffington avatar Sep 03 '19 09:09 skeffington