rmm Fallback resource adaptor

Fallback resource adaptor

Open madsbk opened this issue 1 year ago • 4 comments

New resource adaptor that uses an alternate upstream resource when the primary throws a specified exception type.

The motivation here is to provide NO-OOM by using managed memory when the primary device resource runs out of memory.

Checklist

[X] I am familiar with the Contributing Guidelines.
[X] Python tests
[x] C++ tests
[x] The documentation is up to date with these changes.

Sep 02 '24 13:09 madsbk

Devil's advocating: couldn't we implement this using failure_callback_resource_adaptor? The callback just needs to know about the alternate MR.

Sep 03 '24 00:09 harrism

Devil's advocating: couldn't we implement this using failure_callback_resource_adaptor? The callback just needs to know about the alternate MR.

Not as-is. The callback function in failure_callback_resource_adaptor returns a boolean thus there is no direct way for the callback to return the alternate allocation back to failure_callback_resource_adaptor . Another issue is how to handle the deallocation, an allocation allocated by the alternate resource cannot be deallocated by the primary resource.

It is possible to write a new resource that can handle both failure_callback_resource_adaptor and failure_alternate_resource_adaptor but I think this is a case were two simple resources are better than a complex one.

Sep 03 '24 06:09 madsbk

Mostly doc comments. However also needs C++ tests.

Added C++ tests

Sep 03 '24 11:09 madsbk

Let's put this on hold until we get some more use cases.

Sep 11 '24 13:09 madsbk

rmm rmm copied to clipboard

Fallback resource adaptor

Checklist

rmm
rmm copied to clipboard