Mark Harris
Mark Harris
This is still relevant, but now depends on https://github.com/NVIDIA/libcudacxx/pull/105
If you proceed ahead of `cuda::memory_resource` this will just create more for us to refactor.
I thought we were close before Christmas. But we keep running into more design issues.
Should we push this to 22.10? Still a draft as we are entering code freeze.
@shwina can we bump this to next release?
Still waiting on NVTX headers release.
@seunghwak have you prototyped this with raw CUDA streams to verify that you get the benefits you expect?
Can you share some specifics (speedups) here to help motivate?
Two other points: 1. I believe arena_memory_resource uses separate read and write locks which may enable more concurrency between host threads. We can try something similar in `pool_memory_resource`. 2. Just...
I think the intended design is that everything is async and stream-ordered. A user is free to add a synchronous wrapper if needed, but in general in RAPIDS we need...