dora
dora copied to clipboard
The daemon is dead and cannot be restarted automatically
Describe the bug
At the beginning, dora-daemon and dora-coordinator are running normally. When dataflow is started, because dora-coordinator responds slowly, daemon will automatically exit alone and cannot be restarted.
/home/jarvis/coding/dora_home/dora/target/debug/dora-daemon
Error: lost connection to coordinator
Location:
/home/jarvis/coding/dora_home/dora/binaries/daemon/src/lib.rs:247:29
Process finished with exit code 1
To Reproduce Steps to reproduce the behavior:
- Dora start daemon:
dora up
- Start a new dataflow:
RUST_LOG=info cargo run start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload
- Let dora-coordinator sleep for a while.
Expected behavior
It is expected that after the command dora up
is executed, dora-coodinator and dora-daemon will act as system daemon processes and will not exit abnormally at will.
Environments (please complete the following information):
- System info: ubuntu 22.04
- Dora version: 0.2.3-rc6
Additional context
(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ RUST_LOG=info cargo run start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload
Compiling rustix v0.36.13
Compiling dora-operator-api-c v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/apis/c/operator)
Compiling is-terminal v0.4.4
Compiling clap v4.1.11
Compiling bat v0.23.0
Compiling dora-cli v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/binaries/cli)
Finished dev [unoptimized + debuginfo] target(s) in 7.92s
Running `/home/jarvis/coding/dora_home/dora/target/debug/dora-cli start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload`
failed to spawn dataflow on machine ``
(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$
What do you mean by dora-coordinator sleep for a while.
?
What do you mean by
dora-coordinator sleep for a while.
?
No, the dora-daemon will die due to the exception of the operator and custom node, and the dora-daemon will lose contact with the coodinator. However, the dora-daemon should be in a stable state all the time to be worthy of it's name.
Which exception of the operator and custom node
cause the daemon to die?
@meua ?
@meua ? There's a situation where you don't do it for a long time, and after a long time you get the following situation
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ Error: lost connection to coordinator
Location:
/home/runner/work/dora/dora/binaries/daemon/src/lib.rs:247:29
open new terminal
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora check
Dora Coordinator: ok
Dora Daemon: not running
Environment check failed.
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora list
Running dataflows:
- [webcam] e8125719-4732-4ad2-a732-99bf6e57fc4c
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora stop e8125719-4732-4ad2-a732-99bf6e57fc4c
no daemon connection
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$
Closing as i think this should have been fixed