shared usage of GPUs

Open DiTo97 opened this issue 2 years ago • 0 comments

Hi @ExpectationMax,

How difficult would it be to allow shared usage of GPUs given a known memory constraint in advance? This would be similar to the way many job scheduling softwares work when allocating the correct amount of workers.

The utility is nice as-is, but I think such a feat would be very useful for larger GPUs (over 16 GB of memory).

For instance, we could add a memory arg to the available options of each command and keep track of the per-GPU memory usage, instead of it being an exclusive flag (in use or free). In case memory is not set, we could assume either a default memory allocation request, or the control of a full GPU device, regardless of capacity. Of course, we would have to retrieve the capacity of each available GPU device and make sure that any given process does not exceed the requested memory allocation.

For the latter, the main deep learning frameworks have a way to do it:

but I do not know how we could enforce it at the device level regardless of the framework.

Alternatively, are there any other utilities you know of that already integrate this feat and that I could use?

Jul 08 '23 11:07 DiTo97