rmm
rmm copied to clipboard
Fallback resource adaptor
New resource adaptor that uses an alternate upstream resource when the primary throws a specified exception type.
The motivation here is to provide NO-OOM by using managed memory when the primary device resource runs out of memory.
Checklist
- [X] I am familiar with the Contributing Guidelines.
- [X] Python tests
- [x] C++ tests
- [x] The documentation is up to date with these changes.
Devil's advocating: couldn't we implement this using failure_callback_resource_adaptor? The callback just needs to know about the alternate MR.
Devil's advocating: couldn't we implement this using
failure_callback_resource_adaptor? The callback just needs to know about the alternate MR.
Not as-is. The callback function in failure_callback_resource_adaptor returns a boolean thus there is no direct way for the callback to return the alternate allocation back to failure_callback_resource_adaptor . Another issue is how to handle the deallocation, an allocation allocated by the alternate resource cannot be deallocated by the primary resource.
It is possible to write a new resource that can handle both failure_callback_resource_adaptor and failure_alternate_resource_adaptor but I think this is a case were two simple resources are better than a complex one.
Mostly doc comments. However also needs C++ tests.
Added C++ tests
Let's put this on hold until we get some more use cases.