GenomeWorks [cudamapper] Move overlaps post-processing to GPU

[cudamapper] Move overlaps post-processing to GPU

Open mimaric opened this issue 4 years ago • 1 comments

Currently Overlapper::post_process_overlaps() and new functionality to be added in PR #422 run on CPU. They should be moved to GPU.

There are two reason for that: a) One matcher + overlapper iteration on our benchmark currently takes around 115ms (that number will likely be cut at least in half in the future) and generate 220k overlaps. If done on CPU those overlaps should ideally be post-processed during next matcher + overlapper iteration, giving around 0.5us to process each overlap. Even if we use multiple threads this will still not give us more than 3 - 5us per overlap. b) Output generation is likely to move to GPU and for that we would need the overlaps to remain on device. Also, if we decide to pass the data directly to the next application in the pipeline we would also like to avoid having to copy the data back to host just to copy it back to device

May 12 '20 14:05 mimaric

Notes from #422 :

Refactor rescue_overlap_ends to use a helper function such as extend_overlap_by_similarity(Overlap& overlap, const std::string_view query_sequence, const std::string_view target_sequence, const std::int32_t extension). This should make testing and refactoring to repeatedly run the process easier.
Add a max_extension argument to rescue_end_overlaps to prevent processing very long hanging overlaps.
Use std::string_view if / where possible.
Use a recursive-style overlap rescue approach, scanning extension basepairs forward (resp. backward) from the overlap ends at a time, calculating the similarity, and extending the overlap until it reaches the end of target and/or the end of query and/or the overlap is extended by max_extension residues.
Either fix the sizes of query/target end substrings to be the same and use the Jaccard similarity coefficient (best) or use the Jaccard containment (better).

May 14 '20 03:05 edawson

GenomeWorks GenomeWorks copied to clipboard

[cudamapper] Move overlaps post-processing to GPU

GenomeWorks
GenomeWorks copied to clipboard