Dispatcher stream error, tuner may have crashed (Error on first time)
Environment: Ubuntu 20.04
- NNI version: 2.1
- NNI mode (local|remote|pai):
- Client OS: Desbian
- Server OS (for remote mode only): N\A
- Python version: 3.7
- PyTorch/TensorFlow version: 1.8.0 & 2.4.1
- Is conda/virtualenv/venv used?: N\A
- Is running in Docker?: N\A
Log message:
- nnimanager.log:
[2021-03-25 12:07:53] INFO [ 'Datastore initialization done' ] [2021-03-25 12:07:53] INFO [ 'RestServer start' ] [2021-03-25 12:07:53] INFO [ 'Construct local machine training service.' ] [2021-03-25 12:07:53] INFO [ 'RestServer base port is 8080' ] [2021-03-25 12:07:53] INFO [ 'Rest server listening on: http://0.0.0.0:8080' ] [2021-03-25 12:07:53] INFO [ 'NNIManager setClusterMetadata, key: trial_config, value: {"command":"python3 mnist-keras.py","codeDir":"/home/pi/Downloads/nni_sample/nni/examples/trials/mnist-keras/.","gpuNum":0}' ] [2021-03-25 12:07:53] INFO [ 'required GPU number is 0' ] [2021-03-25 12:07:53] INFO [ 'Starting experiment: 236L9qwk' ] [2021-03-25 12:07:53] INFO [ 'Change NNIManager status from: INITIALIZED to: RUNNING' ] [2021-03-25 12:07:53] INFO [ 'Add event listeners' ] [2021-03-25 12:07:53] ERROR [ 'Dispatcher error: This socket has been ended by the other party' ] [2021-03-25 12:07:53] ERROR [ 'Error: Dispatcher stream error, tuner may have crashed.\n at EventEmitter.dispatcher.onError (/usr/local/lib/python3.7/dist-packages/nni_node/core/nnimanager.js:550:32)\n at EventEmitter.emit (events.js:198:13)\n at Socket.IpcInterface.outgoingStream.on (/usr/local/lib/python3.7/dist-packages/nni_node/core/ipcInterface.js:42:72)\n at Socket.emit (events.js:198:13)\n at Socket.writeAfterFIN [as write] (net.js:399:8)\n at IpcInterface.sendCommand (/usr/local/lib/python3.7/dist-packages/nni_node/core/ipcInterface.js:49:38)\n at NNIManager.sendInitTunerCommands (/usr/local/lib/python3.7/dist-packages/nni_node/core/nnimanager.js:558:25)\n at NNIManager.run (/usr/local/lib/python3.7/dist-packages/nni_node/core/nnimanager.js:523:14)\n at NNIManager.startExperiment (/usr/local/lib/python3.7/dist-packages/nni_node/core/nnimanager.js:135:14)' ] [2021-03-25 12:07:53] INFO [ 'Change NNIManager status from: RUNNING to: ERROR' ] [2021-03-25 12:07:53] WARNING [ 'Commands jammed in buffer!' ] [2021-03-25 12:07:53] INFO [ 'Run local machine training service.' ]
- dispatcher.log:
-
Neural Network Intelligence - nnictl stdout and stderr:
cd /home/User/Downloads/nni_sample/nni/examples/trials/mnist-keras nnictl create --config config.yml
Problem How Should I able to fix this issues since I'm working to learn the basic of NNI. TQ.
Do you mean dispatcher.log contains <!doctype html><title>... stuff?
Could you upload dispatcher.log as attachment?
hello @koklimabc, could you upgrade the version of NNI and try again? If this issue still exists, please upload dispatcher.log as an attachment? Thank you!
i'm using the latest version of NNI, but encounterd the similar error.
[2021-11-29 09:43:54] INFO (NNIDataStore) Datastore initialization done [2021-11-29 09:43:54] INFO (RestServer) RestServer start [2021-11-29 09:43:54] WARNING (NNITensorboardManager) Tensorboard may not installed, if you want to use tensorboard, please check if tensorboard installed. [2021-11-29 09:43:54] INFO (RestServer) RestServer base port is 8080 [2021-11-29 09:43:54] INFO (main) Rest server listening on: http://0.0.0.0:8080 [2021-11-29 09:43:55] INFO (NNIManager) Starting experiment: 3mZT9tIn [2021-11-29 09:43:55] INFO (NNIManager) Setup training service... [2021-11-29 09:43:55] INFO (LocalTrainingService) Construct local machine training service. [2021-11-29 09:43:55] INFO (NNIManager) Setup tuner... [2021-11-29 09:43:55] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING [2021-11-29 09:43:55] INFO (NNIManager) Add event listeners [2021-11-29 09:43:55] INFO (LocalTrainingService) Run local machine training service. [2021-11-29 09:43:55] ERROR (NNIManager) Dispatcher error: read ECONNRESET [2021-11-29 09:43:55] ERROR (NNIManager) Error: Dispatcher stream error, tuner may have crashed. at EventEmitter.<anonymous> (/home/biopharm/llf/anaconda3/envs/pytorch/lib/python3.8/site-packages/nni_node/core/nnimanager.js:650:32) at EventEmitter.emit (node:events:394:28) at Socket.<anonymous> (/home/biopharm/llf/anaconda3/envs/pytorch/lib/python3.8/site-packages/nni_node/core/ipcInterface.js:70:72) at Socket.emit (node:events:394:28) at emitErrorNT (node:internal/streams/destroy:193:8) at emitErrorCloseNT (node:internal/streams/destroy:158:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) [2021-11-29 09:43:55] INFO (NNIManager) Change NNIManager status from: RUNNING to: ERROR
Here is the dispatcher.log file. dispatcher.log
@kvartet @liuzhe-lz
i'm using the latest version of NNI, but encounterd the similar error.
[2021-11-29 09:43:54] INFO (NNIDataStore) Datastore initialization done [2021-11-29 09:43:54] INFO (RestServer) RestServer start [2021-11-29 09:43:54] WARNING (NNITensorboardManager) Tensorboard may not installed, if you want to use tensorboard, please check if tensorboard installed. [2021-11-29 09:43:54] INFO (RestServer) RestServer base port is 8080 [2021-11-29 09:43:54] INFO (main) Rest server listening on: http://0.0.0.0:8080 [2021-11-29 09:43:55] INFO (NNIManager) Starting experiment: 3mZT9tIn [2021-11-29 09:43:55] INFO (NNIManager) Setup training service... [2021-11-29 09:43:55] INFO (LocalTrainingService) Construct local machine training service. [2021-11-29 09:43:55] INFO (NNIManager) Setup tuner... [2021-11-29 09:43:55] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING [2021-11-29 09:43:55] INFO (NNIManager) Add event listeners [2021-11-29 09:43:55] INFO (LocalTrainingService) Run local machine training service. [2021-11-29 09:43:55] ERROR (NNIManager) Dispatcher error: read ECONNRESET [2021-11-29 09:43:55] ERROR (NNIManager) Error: Dispatcher stream error, tuner may have crashed. at EventEmitter.<anonymous> (/home/biopharm/llf/anaconda3/envs/pytorch/lib/python3.8/site-packages/nni_node/core/nnimanager.js:650:32) at EventEmitter.emit (node:events:394:28) at Socket.<anonymous> (/home/biopharm/llf/anaconda3/envs/pytorch/lib/python3.8/site-packages/nni_node/core/ipcInterface.js:70:72) at Socket.emit (node:events:394:28) at emitErrorNT (node:internal/streams/destroy:193:8) at emitErrorCloseNT (node:internal/streams/destroy:158:3) at processTicksAndRejections (node:internal/process/task_queues:83:21) [2021-11-29 09:43:55] INFO (NNIManager) Change NNIManager status from: RUNNING to: ERRORHere is the dispatcher.log file. dispatcher.log
@kvartet @liuzhe-lz
@Chenwf1025 @koklimabc - are you still facing the issue with the latest release of nni?