nni Node server crash

Besides, when I use nni to connect another machine (it can connect itself with "remote" platform), the same problem occurs: "FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory"

Environment:

NNI version: master
Training service (local|remote|pai|aml|etc): local, and reusemode=False
Client OS: windows 10.0.19042.1466
Server OS (for remote mode only): windows 10.0.19042.1466
Python version: 3.8
PyTorch/TensorFlow version:
Is conda/virtualenv/venv used?:
Is running in Docker?:

Configuration:

Experiment config (remember to remove secrets!):
Search space:

Log message:

nnimanager.log:
dispatcher.log:
nnictl stdout and stderr:

How to reproduce it?:

Jun 29 '23 07:06 XiaoXiao-Woo

I'm also getting this error consistently when the number of trials in an experiment gets about about 40,000. Has happened on > 5 different experiments.

NNI version: master
Training service (local|remote|pai|aml|etc): local, and reusemode=False
Client OS: ubuntu 22.04.2 LTS
Server OS (for remote mode only): n/a
Python version: 3.7.4
PyTorch/TensorFlow version: n/a
Is conda/virtualenv/venv used?: conda
Is running in Docker?: no

Aug 06 '23 23:08 studywolf

You can set export NODE_OPTIONS="--max_old_space_size=8192" for a quick fix.

Aug 08 '23 07:08 liuzhe-lz

ah, thanks for the suggestion! It spools up the run when I resume but immediately fails on me silently

Aug 11 '23 00:08 studywolf

any progress on this?

Sep 06 '23 00:09 studywolf

checking in again

Oct 06 '23 17:10 studywolf

You can set export NODE_OPTIONS="--max_old_space_size=8192" for a quick fix.

unfortunately, still getting the heap out of memory error after 29k trials. this is in nni 3.0

Oct 07 '23 17:10 studywolf

nni nni copied to clipboard

Node server crash

nni
nni copied to clipboard