abPOA Is there any way to reduce memory consumption?

Is there any way to reduce memory consumption?

Open glennhickey opened this issue 3 years ago • 24 comments

Hello, I'm experimenting with adding abPOA as an option within cactus (manuscript). Thanks for making a great tool -- it's amazingly fast.

I was wondering if there's a way to reduce memory consumption, however, in order to increase the sequence lengths I can run on. Right now it seems roughly quadratic in the sequence length, which is as expected when reading your manuscript. I'm curious to know if there are any options I can use to reduce this and/or if you've thought about using the banding to reduce the DP table size (as far as I can tell, it's only used to reduce computation)?

Oct 20 '20 16:10 glennhickey

Hi Glenn, you are right. Right now, the banding in abPOA only reduces the time, not the memory, so it is still quadratic. I do plan to reduce the memory consumption in different ways, but I haven't implemented it yet. Will let you know if I have any progress.

Yan

Oct 21 '20 01:10 yangao07

I would also like to express my interest in resolving this issue :+1: It would be really nice to be able to take full advantage of the banding.

Jan 17 '21 21:01 rlorigro

Hi Glenn,

In the latest version of abPOA (v1.2.0), I implemented minimizer-based seeding before POA, this can reduce the memory usage for long input sequence. For most of the time, it can produce nearly the same or even better alignment result.

Please try it out and let me know if this works for you.

Yan

May 15 '21 10:05 yangao07

Great! This is perfect timing since I was just about to review some of what I'd been doing for stitching alignments together. I'll try it out on Monday. Thanks!

May 15 '21 12:05 glennhickey

@glennhickey, just updated abPOA to v1.2.1, removes a redundant sorting step which is very time-consuming.

May 18 '21 08:05 yangao07

Thanks for letting me know. I'm switching to 1.2.1 now. My 1.2.0 tests have been okay so far: it passes all Cactus unit tests, and let me disable our current stitching logic on a bigger run. I'll do a much bigger test this week and report the results.

May 18 '21 13:05 glennhickey

Do you have a sense of the maximum sequence lengths I can pass in while using the seeding? I just got an error

               [SIMDMalloc] mm_Malloc fail!
                Size: 549755813888

when I allowed up to 1mb. Thanks.

May 19 '21 12:05 glennhickey

@glennhickey what alignment parameters are you using?

May 19 '21 13:05 ekg

@ekg I'm still using default everything (edit -- except wb/wf which I increased from 10/0.01 to 25/0.025). I haven't yet explored the parameter space much despite meaning to for a while, especially in the context of alignments between more distant species.

Until now, I've been capping abpoa jobs at 10kb (and using an overlapping sliding window and stitching the results together). Bumping this up to 1mb with the latest abpoa seemed to work on smaller tests but not on a bigger job.

May 19 '21 13:05 glennhickey

I'm getting failures even on datasets that ran before (without the seeding)

...
               == 05-19-2021 22:50:05 == [abpoa_anchor_poa] Performing POA between anchors ...                                                                                 
                == 05-19-2021 22:50:07 == [abpoa_anchor_poa] Performing POA between anchors done.                                                                               
                == 05-19-2021 22:50:07 == [abpoa_build_guide_tree_partition] Seeding and chaining ...                                                                           
                == 05-19-2021 22:50:07 == [abpoa_build_guide_tree_partition] Seeding and chaining done!                                                                         
                == 05-19-2021 22:50:07 == [abpoa_anchor_poa] Performing POA between anchors ...                                                                                 
                == 05-19-2021 22:50:07 == [abpoa_anchor_poa] Performing POA between anchors done.                                                                               
                Command terminated by signal 11

May 19 '21 23:05 glennhickey

@glennhickey Can you share the dataset that causes the error/failure?

May 20 '21 02:05 yangao07

Sure. I'll need to hack cactus a bit to spit it out, but should be able to do that soon.

May 20 '21 12:05 glennhickey

I was just about to send another error segfault I got without seeding

wget http://public.gi.ucsc.edu/~hickey/debug/abpoa_fail_may26.fa
abpoa ./abpoa_fail_may26.fa -m 0 -o out.msa -r 1 -N -b 100 -f 0.025 -M 96 -X 90 -O 400,1200 -E 30,1

But realized when I built with a newer -march it worked. More specifically, upgrading -march=nehalem to -march=haswell fixed it. (Cactus had previously built against nehalem to maximize portability for releases). I think it's pretty likely the problem I mentioned above is related to this.

May 26 '21 23:05 glennhickey

@glennhickey I did not get any error on my computer for this data. However, I did notice a big difference when using the scoring parameter you mentioned. Not only causes different MSA output but also eats a bigger memory and requires a longer run time.

abpoa ./abpoa_fail_may26.fa -m 0 -o out.msa -r 1 -N -b 100 -f 0.025 -M 96 -X 90 -O 400,1200 -E 30,1
[abpoa_main] Real time: 111.849 sec; CPU: 111.208 sec; Peak RSS: 17.228 GB.
abpoa ./abpoa_fail_may26.fa -m 0 -o out.msa -r 1 -N -b 100 -f 0.025
[abpoa_main] Real time: 28.047 sec; CPU: 27.946 sec; Peak RSS: 3.135 GB.

For seeding mode:

abpoa ./abpoa_fail_may26.fa -m 0 -o out.msa -r 1 -b 100 -f 0.025 -M 96 -X 90 -O 400,1200 -E 30,1
[abpoa_main] Real time: 94.547 sec; CPU: 70.398 sec; Peak RSS: 13.474 GB.
abpoa ./abpoa_fail_may26.fa -m 0 -o out.msa -r 1 -b 100 -f 0.025 
[abpoa_main] Real time: 35.114 sec; CPU: 26.085 sec; Peak RSS: 4.830 GB.

May 27 '21 02:05 yangao07

Yes, that data works for me now too. I just thought it was interesting as that command line was the first I found that did not work on architectures older than haswell, so to reproduce the crash you'd have to build with -march nehalem instead of -march native (or use a computer that's more than 7 years old).

While the scoring parameters make a big difference in runtime, they also seem to help accuracy considerably when aligning different species together. The best we've found for this has been to use the default HOXD70 matrix from lastz

	A	C	G	T
A	91	‑114	‑31	‑123
C	‑114	100	‑125	‑31
G	‑31	‑125	100	‑114
T	‑123	‑31	‑114	91

On a simulation test, this matrix (which I override in abpt->mat) brings accuracy up by around 7% vs the abpoa defaults. On less divergent sequences, there is also an improvement but it is much much smaller.

May 27 '21 12:05 glennhickey

Thanks for the information! I am working on changing the scoring parameter options.

Yan

May 27 '21 14:05 yangao07

Hi @yangao07 , I've been experimenting with the seeding option to run on longer sequence sizes. It works pretty well, but I get segfaults from time to time. Here are some examples

wget http://public.gi.ucsc.edu/~hickey/debug/abpoa_fail_mar17.tar.gz
tar xzf abpoa_fail_mar17.tar.gz
for f in abpoa_fail_mar17/*.fa; do abpoa $f -m 0 -r 1 -S ; done

If I understand, the memory with seeding is much lower... but only if abpoa can find enough seeds. If the sequences are too diverged, the memory can still explode.

If this is correct, do you think there would be a way to change the API to fail more gracefully in these cases? For example, if there are not enough seeds, and the memory will exceed a given threshold, return an error code. Or a function that checks the seeds in the input and estimates the memory requirement? Either of these would allow the user to use seeding when possible and fall back on another approach if it won't work.

Thanks as always for your wonderful tool!

Mar 17 '22 16:03 glennhickey

but only if abpoa can find enough seeds. If the sequences are too diverged, the memory can still explode.

You are right, for divergent sequences, specifically with greatly different lengths, the memory can still be very large.

The memory size is simply dependent on the graph size and the sequence length, so it can be estimated. I can try to add a pre-calculation step for this.

Yan

Mar 23 '22 10:03 yangao07

Thanks, that would be amazing. Or even some kind of interface where the user passes in a MAX_SIZE parameter and abpoa exits 1 instead instead of trying to allocate >MAX_SIZE would be very helpful.

Mar 23 '22 15:03 glennhickey

exits 1 instead instead of trying to allocate >MAX_SIZE would be very helpful.

oops, exit isn't much better than running out of memory -- would have to be a return code or exception.

Mar 23 '22 19:03 glennhickey

Hey @glennhickey , I am working on adding some interfaces related to memory usage by abPOA. Here is what I have done for now: Added 2 variables in abpoa_t: status/req_mem. For status, 0 means success, 1 means not enough memory, and 2 means other errors. req_mem indicates the size of memory abPOA tried to allocate but failed.

This way, by checking the status variable, users can choose to re-run abpoa with adjusted parameters. What do you think?

Jun 10 '22 21:06 yangao07

Wow, I'm happy to hear that you're thinking about this!

If I call abpoa, it fails a malloc, and instead of crashing it sets status to 1 and returns normally, that would be a big help indeed and I'd definitely like to try it out.

I would still be a bit worried though, because I run abpoa in a bunch of different threads on cloud instances, so I could imagine a case where a big malloc succeeds and abpoa takes 100% of the resources on a system, then all concurrent threads would crash and that would effectively bring down the job I was running anyway, even if it's not directly abpoa's fault.

Do you think there could be any way for me to give abpoa a limit, and ask it to set status 1 if it ever tries to allocate more than that amount of memory at one time?

thanks again for all your help.

Jun 13 '22 13:06 glennhickey

I have thought about using the size limit. The concern is that the size abpoc allocates is the virtual size, not the resident size, and the virtual size could be much larger than the physical memory of the computer. I am not sure how to properly set the size limit, do you have any idea?

Jun 13 '22 19:06 yangao07

ok, I think I understand better now, thanks. I was (probably naively) hoping it was simple for you to detect how far from the band it'd gotten and abort before overrunning the memory. I'll try to see with some of my colleagues who know more about virtual address spaces than I do.

Jun 14 '22 16:06 glennhickey

abPOA abPOA copied to clipboard

Is there any way to reduce memory consumption?

abPOA
abPOA copied to clipboard