nni
nni copied to clipboard
Node server crash
Besides, when I use nni to connect another machine (it can connect itself with "remote" platform), the same problem occurs:
"FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory"
Environment:
- NNI version: master
- Training service (local|remote|pai|aml|etc): local, and reusemode=False
- Client OS: windows 10.0.19042.1466
- Server OS (for remote mode only): windows 10.0.19042.1466
- Python version: 3.8
- PyTorch/TensorFlow version:
- Is conda/virtualenv/venv used?:
- Is running in Docker?:
Configuration:
- Experiment config (remember to remove secrets!):
- Search space:
Log message:
- nnimanager.log:
- dispatcher.log:
- nnictl stdout and stderr:
How to reproduce it?:
I'm also getting this error consistently when the number of trials in an experiment gets about about 40,000. Has happened on > 5 different experiments.
NNI version: master
Training service (local|remote|pai|aml|etc): local, and reusemode=False
Client OS: ubuntu 22.04.2 LTS
Server OS (for remote mode only): n/a
Python version: 3.7.4
PyTorch/TensorFlow version: n/a
Is conda/virtualenv/venv used?: conda
Is running in Docker?: no
You can set export NODE_OPTIONS="--max_old_space_size=8192"
for a quick fix.
ah, thanks for the suggestion! It spools up the run when I resume but immediately fails on me silently
any progress on this?
checking in again
You can set
export NODE_OPTIONS="--max_old_space_size=8192"
for a quick fix.
unfortunately, still getting the heap out of memory error after 29k trials. this is in nni 3.0