cuCollections
cuCollections copied to clipboard
Implement OA `retrieve(_outer)` and its `multiset` API
WIP
Closes #465 Closes #489
The outer test is still failing and the speeddown compared to the previous implementation is still 1.5x. Apart from that, the other unit tests look good. So the natural next steps would be to fix the bug in the retrieve_outer (shouldn't be a big deal) and dive into optimizations. For the latter I could use a second pair of eyes since this kernel is notoriously complex.
For the latter I could use a second pair of eyes since this kernel is notoriously complex.
Commenting out the code part by part to find the largest bottleneck is probably the most efficient way.