nni icon indicating copy to clipboard operation
nni copied to clipboard

Local GPU Allocation

Open busFred opened this issue 1 year ago • 0 comments

Describe the issue: Currently, user can only impose gpu resource constraint using useActiveGpu, maxTrialNumberPerGpu, trialGpuNumber, and gpuIndices in local mode. However, modern gpu have very large memory and a lot of workstations have multiple computers. It is necessary and would be beneficial to allow factional gpu resources allocation, i.e. 10 tasks with 5 of them only using the first gpu and the second half only using the second gpu; this feature is similar to ray tune fractional resources.

Environment:

  • NNI version: 2.10.1
  • Training service : local
  • Client OS: Ubuntu 20.04
  • Server OS (for remote mode only):
  • Python version: 3.11
  • PyTorch version: 2.0.1
  • Lightning Fabric: 2.0.5
  • Is conda/virtualenv/venv used?: venv
  • Is running in Docker?: no

Configuration:

trial_concurrency: 20
max_trial_number: 30

training_service:
  platform: local
  useActiveGpu: True
  gpuIndices: 0

Log message: Doesn't matter here.

  • nnimanager.log:
  • dispatcher.log:
  • nnictl stdout and stderr:

How to reproduce it?:

busFred avatar Jul 19 '23 05:07 busFred