t2v-transformers-models
t2v-transformers-models copied to clipboard
Support PyTorch `set_per_process_memory_fraction`
Summary
PyTorch allows a limit for GPU memory. This is useful, for example, when a GPU resource is shared.
set_per_process_memory_fraction(fraction, device=None): Set memory fraction for a process. The fraction is used to limit an caching allocator to allocated memory on a CUDA device. The allowed value equals the total visible memory multiplied fraction. If trying to allocate more than the allowed value in a process, will raise an out of memory error in allocator.
Proposal
This setting takes a percentage [0-1] and a device (optional). Use an environment variable alongside ENABLE_CUDA of the format CUDA_MEMORY_FRACTION where the value is 0.0-1.0 and passed to fraction. Additionally, if set, check and prefer CUDA_MEMORY_FRACTION_... variable(s), where the value is the same format, and the ... is passed to device for each variable found.
Questions
- [ ] Is there a better name than
CUDA_MEMORY_FRACTION/CUDA_MEMORY_FRACTION_...? - [ ] Do we need multiple device support initially? We don't seem to currently support device selection.
- [x] Is this supported on our current PyTorch (1.13)?
One use case is for AWS vGPU support so that multiple consumers of the vGPU device(s) don't assume they have exclusive rights to the full resource usage.