nni
nni copied to clipboard
Local GPU Allocation
Describe the issue:
Currently, user can only impose gpu resource constraint using useActiveGpu
, maxTrialNumberPerGpu
, trialGpuNumber
, and gpuIndices
in local mode. However, modern gpu have very large memory and a lot of workstations have multiple computers. It is necessary and would be beneficial to allow factional gpu resources allocation, i.e. 10 tasks with 5 of them only using the first gpu and the second half only using the second gpu; this feature is similar to ray tune fractional resources.
Environment:
- NNI version: 2.10.1
- Training service : local
- Client OS: Ubuntu 20.04
- Server OS (for remote mode only):
- Python version: 3.11
- PyTorch version: 2.0.1
- Lightning Fabric: 2.0.5
- Is conda/virtualenv/venv used?: venv
- Is running in Docker?: no
Configuration:
trial_concurrency: 20
max_trial_number: 30
training_service:
platform: local
useActiveGpu: True
gpuIndices: 0
Log message: Doesn't matter here.
- nnimanager.log:
- dispatcher.log:
- nnictl stdout and stderr:
How to reproduce it?: