edlib Idea: share working memory among edlibAlign() calls

Idea: share working memory among edlibAlign() calls

Open Martinsos opened this issue 7 years ago • 3 comments

Idea comes from @rob-p:

The usage scenario I envision is the following. I have many threads running independently and each one repeatedly want's to perform the following task. I have a single read (75 --- 150bp long) and I know approximately where I want to perform the alignment. That is, I have a putative position on the reference where the alignment should begin (though, perhaps I want to include a few nucleotides of "wiggle room" to allow the read to align slightly before / after this position).

That is, I have a scenario like this. I want to align a N basepair read to an N+10 (say) length substring of the reference. I know the approximate position where the read should start. As you mention above, I have an edit distance budget as well (i.e. a k), so that if the edit distance exceeds k, I'm no longer interested in retaining the alignment.

Now, each read will repeat this process of taking a read, using our quasi-mapping algorithm to determine where the read comes from, and then attempting to actually align the read at this position. The reason I've made so much noise about working space is that new and delete are well-behaved in single-threaded land, but can become a significant resource contention since threads will be asking for resources repeatedly. The other interesting thing about this use case is that since all of the reads are approximately the same length, it's not as if there will be many problems of drastically different size. So, the memory allocated for one problem can probably be re-used (after it is cleared) for >= 90% of the alignment problems a thread will encounter. In this case, I agree that it's probably too much trouble for you to implement a caching system and all of that complexity. However, one easy solution would be to provide a version of your main function that simply accepts an extra parameter (or extra parameters) that are the working space for the alignment. That is, the onus falls on the caller to have previously allocated the memory (perhaps by calling an edlib utility function) and to free the memory when it's no longer needed. This way, a thread can allocate the working space for it's first alignment, and then simply re-use this space for the rest of the alignments it performs.

This may be useful for short sequences, but will certainly bring some complexity. I suggest we wait for the new algorithm to be implemented for shorter sequences and than see if this is still needed and how complex will it be.

Feb 25 '17 12:02 Martinsos

Some of my thoughts on this:

I am somewhat careful about introducing this feature because I feel it will bring more than little complexity. I have many functions, some of them being used in different scenarios and on different places, regarding of the method used. I would have to propagate the "working memory" through all of them, and make sure that if "working memory" is not given, everything still works. I could probably just allocate it myself if not given or something as a way around this. However, "working memory" means that in one central place I have the "knowledge" from all the functions - internal knowledge of how are they allocating the memory internally and using it. I like having things encapsulated as much as possible, while this would have exactly the opposite effect. Also, this would require careful planning of how much space is needed upfront, which is not always easy to determine. If there is not enough space at some moment, I would have to detect and manage that. All this would require changes through the whole codebase, coupling internal logic of certain functions with "working memory" and its management, while on the other hand also complicating the final API somewhat. Also, I expect all this to make further maintainance and upgrading of Edlib harder. All this does not mean that this is not worth implementing - however, it makes me think hard before deciding to do it, and I would rather be sure that there is no other, less intrusive way to do this, and that benefits brought by this would justify the added complexity.

I believe that the biggest performance improvement can be made by implementing alternative algorithm for shorter sequences. Having this in mind, and also having in mind that memory allocation has impact only on short sequences, it is obvious that it would not make much sense to optimize the current algorithm for it.

Therefore, I suggest the following: let's wait until the algorithm for shorter sequences is implemented and in place. Then, we can revisit the problem of allocating memory and idea of "working memory" and see if it is still needed and how hard it will be to implement. It may be that with the new algorithm this will not be a problem at all anymore (maybe all memory needed for short sequences can fit on stack) or maybe implementing this change will be much simpler. Does that make sense?

Feb 25 '17 13:02 Martinsos

Hi @Martinsos,

I just wanted to update you on this. We did, indeed, end up using edlib in our current implementation of the idea to which I was alluding (pre-print available here). I ended up making some of the changes we mentioned in a very hackish fashion. You can see the code in the read-aligner branch of my fork of edlib. This is a huge hack primarily because I only implemented the changes for the one alignment mode that we needed (end-to-end), and only for the edit distance computation. However, for those cases, it made a significant difference in time / memory. Since each alignment problem is almost exactly the same size, I created an AlignerEngine object that caches all of the memory used by edlib to solve an alignment problem. The operator() of this object performs the actual alignment. When a new alignment is performed, the memory kept by the object is cleared as needed and then reused for the next problem. Each thread can have its own AlignerEngine so they all have separate memory. I also changed some raw ptr uses to unique_ptr so I didn't have to deal with certain resource management manually.

Again, I threw this together under duress (actually computing the edit distances was only a small part of our algorithm / method), and its certainly not my proudest code. However, feel free to take a look and comment. I still feel strongly that having the ability to use edlib easily in this manner (i.e., for short read alignment and edit distance computation) is a very interesting use case. It certainly worked out very well in our case!

May 28 '17 14:05 rob-p

This is awesome @rob-p! I see there are a lot of changes - I will certainly take a look at it and see if I can reuse some parts and build from it (and add you as a contributor in that case). I like the idea - the only thing, as I mentioned before, is that I am not sure if added complexity is worth the gained performance, but from what you are saying it seems like it might be! Could you give me some estimate of the impact on time / memory - what was the impact in your case? It will probably take some longer time for me to get to this, as it is a pretty big change and there is so little time :), but I will let you know when there will be some progress.

Good luck with your article! Small note: it should be "Myers's", not "Myer's" -> it is a little bit convoluted but I spent some time learning what is the correct form.

May 31 '17 19:05 Martinsos

edlib edlib copied to clipboard

Idea: share working memory among edlibAlign() calls

edlib
edlib copied to clipboard