nni Local GPU Allocation

Local GPU Allocation

Open busFred opened this issue 1 year ago • 0 comments

Describe the issue: Currently, user can only impose gpu resource constraint using useActiveGpu, maxTrialNumberPerGpu, trialGpuNumber, and gpuIndices in local mode. However, modern gpu have very large memory and a lot of workstations have multiple computers. It is necessary and would be beneficial to allow factional gpu resources allocation, i.e. 10 tasks with 5 of them only using the first gpu and the second half only using the second gpu; this feature is similar to ray tune fractional resources.

Environment:

NNI version: 2.10.1
Training service : local
Client OS: Ubuntu 20.04
Server OS (for remote mode only):
Python version: 3.11
PyTorch version: 2.0.1
Lightning Fabric: 2.0.5
Is conda/virtualenv/venv used?: venv
Is running in Docker?: no

Configuration:

trial_concurrency: 20
max_trial_number: 30

training_service:
  platform: local
  useActiveGpu: True
  gpuIndices: 0

Log message: Doesn't matter here.

nnimanager.log:
dispatcher.log:
nnictl stdout and stderr:

How to reproduce it?:

Jul 19 '23 05:07 busFred

nni nni copied to clipboard

Local GPU Allocation

nni
nni copied to clipboard