spoa icon indicating copy to clipboard operation
spoa copied to clipboard

Memory usage

Open iminkin opened this issue 6 years ago • 6 comments

Hi, spoa is a fantastic tool, but the memory usage is a bit high. Is there any way to adjust the parameters to make it smaller?

iminkin avatar Aug 30 '19 02:08 iminkin

Hi Ilia, unfortunately the memory complexity is quadratic (O(graph_length * sequence_length)). We might add banded alignment which should facilitate the memory consumption. If your input is huge, you could try slicing your input into chunks.

Best regards, Robert

rvaser avatar Aug 30 '19 11:08 rvaser

Thanks, banded alignment would be very cool to have.

iminkin avatar Aug 30 '19 16:08 iminkin

It should be possible to use the dozeu x-drop aligner to do this. That would resolve the quadratic memory issue.

Alignment would have to be run in phases, because the x-drop parameter requires that the alignment starts where there is a solid match with the target graph. The cycle would be to scan for the first hit, then align until breakage, then scan again until the next hit.

ekg avatar May 31 '20 09:05 ekg

@ekg, does this approach guarantee optimal alignments?

rvaser avatar Jun 02 '20 11:06 rvaser

@rvaser no, I don't think we can guarantee optimality without evaluating the full matrix, and this will only evaluate a subset that falls within the limits of the x-drop and scoring parameters. Furthermore, deciding where to start the process is heuristic, and would have to be based on some kind of seeding.

ekg avatar Jun 02 '20 15:06 ekg

Thanks for the clarification, I'll think about if the scope of changes is worth exploring.

rvaser avatar Jun 03 '20 17:06 rvaser