t2v-transformers-models icon indicating copy to clipboard operation
t2v-transformers-models copied to clipboard

Support PyTorch `set_per_process_memory_fraction`

Open kcm opened this issue 3 years ago • 1 comments

Summary

PyTorch allows a limit for GPU memory. This is useful, for example, when a GPU resource is shared.

set_per_process_memory_fraction(fraction, device=None): Set memory fraction for a process. The fraction is used to limit an caching allocator to allocated memory on a CUDA device. The allowed value equals the total visible memory multiplied fraction. If trying to allocate more than the allowed value in a process, will raise an out of memory error in allocator.

Proposal

This setting takes a percentage [0-1] and a device (optional). Use an environment variable alongside ENABLE_CUDA of the format CUDA_MEMORY_FRACTION where the value is 0.0-1.0 and passed to fraction. Additionally, if set, check and prefer CUDA_MEMORY_FRACTION_... variable(s), where the value is the same format, and the ... is passed to device for each variable found.

Questions

kcm avatar Dec 29 '22 15:12 kcm

One use case is for AWS vGPU support so that multiple consumers of the vGPU device(s) don't assume they have exclusive rights to the full resource usage.

kcm avatar Dec 29 '22 15:12 kcm