GenomeWorks
GenomeWorks copied to clipboard
[cudamapper] Move overlaps post-processing to GPU
Currently Overlapper::post_process_overlaps()
and new functionality to be added in PR #422 run on CPU. They should be moved to GPU.
There are two reason for that: a) One matcher + overlapper iteration on our benchmark currently takes around 115ms (that number will likely be cut at least in half in the future) and generate 220k overlaps. If done on CPU those overlaps should ideally be post-processed during next matcher + overlapper iteration, giving around 0.5us to process each overlap. Even if we use multiple threads this will still not give us more than 3 - 5us per overlap. b) Output generation is likely to move to GPU and for that we would need the overlaps to remain on device. Also, if we decide to pass the data directly to the next application in the pipeline we would also like to avoid having to copy the data back to host just to copy it back to device
Notes from #422 :
- Refactor
rescue_overlap_ends
to use a helper function such asextend_overlap_by_similarity(Overlap& overlap, const std::string_view query_sequence, const std::string_view target_sequence, const std::int32_t extension)
. This should make testing and refactoring to repeatedly run the process easier. - Add a
max_extension
argument to rescue_end_overlaps to prevent processing very long hanging overlaps. - Use
std::string_view
if / where possible. - Use a recursive-style overlap rescue approach, scanning
extension
basepairs forward (resp. backward) from the overlap ends at a time, calculating the similarity, and extending the overlap until it reaches the end of target and/or the end of query and/or the overlap is extended bymax_extension
residues. - Either fix the sizes of query/target end substrings to be the same and use the Jaccard similarity coefficient (best) or use the Jaccard containment (better).