nni
nni copied to clipboard
NNI not running anymore without error messages when CPU reached 100% once
Describe the issue: Once the CPU utilization reached 100% once, even though NNI will finish the running trials but will not run the remaining trials.
Environment:
- NNI version: 3.0
- Training service (local|remote|pai|aml|etc): local
- Client OS: ubuntu 22.04, 20.04
- Server OS (for remote mode only):
- Python version: 3.8
- PyTorch/TensorFlow version: 2.1.0
- Is conda/virtualenv/venv used?: conda
- Is running in Docker?: no
How to reproduce it?: You could run a task that consumes CPU resources across multiple trials simultaneously, and you will observe this issue.
I think this issue is as the same as this one #965 .