dora icon indicating copy to clipboard operation
dora copied to clipboard

The daemon is dead and cannot be restarted automatically

Open meua opened this issue 1 year ago • 5 comments

Describe the bug

At the beginning, dora-daemon and dora-coordinator are running normally. When dataflow is started, because dora-coordinator responds slowly, daemon will automatically exit alone and cannot be restarted.

/home/jarvis/coding/dora_home/dora/target/debug/dora-daemon
Error: lost connection to coordinator

Location:
    /home/jarvis/coding/dora_home/dora/binaries/daemon/src/lib.rs:247:29

Process finished with exit code 1

To Reproduce Steps to reproduce the behavior:

  1. Dora start daemon: dora up
  2. Start a new dataflow: RUST_LOG=info cargo run start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload
  3. Let dora-coordinator sleep for a while.

Expected behavior It is expected that after the command dora up is executed, dora-coodinator and dora-daemon will act as system daemon processes and will not exit abnormally at will.

Environments (please complete the following information):

  • System info: ubuntu 22.04
  • Dora version: 0.2.3-rc6

Additional context

(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ RUST_LOG=info cargo run start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload
   Compiling rustix v0.36.13
   Compiling dora-operator-api-c v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/apis/c/operator)
   Compiling is-terminal v0.4.4
   Compiling clap v4.1.11
   Compiling bat v0.23.0
   Compiling dora-cli v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/binaries/cli)
    Finished dev [unoptimized + debuginfo] target(s) in 7.92s
     Running `/home/jarvis/coding/dora_home/dora/target/debug/dora-cli start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload`
failed to spawn dataflow on machine ``
(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ 

meua avatar May 15 '23 07:05 meua

What do you mean by dora-coordinator sleep for a while.?

haixuanTao avatar May 24 '23 10:05 haixuanTao

What do you mean by dora-coordinator sleep for a while.?

No, the dora-daemon will die due to the exception of the operator and custom node, and the dora-daemon will lose contact with the coodinator. However, the dora-daemon should be in a stable state all the time to be worthy of it's name.

meua avatar May 24 '23 15:05 meua

Which exception of the operator and custom node cause the daemon to die?

haixuanTao avatar May 24 '23 15:05 haixuanTao

@meua ?

haixuanTao avatar May 31 '23 15:05 haixuanTao

@meua ? There's a situation where you don't do it for a long time, and after a long time you get the following situation

(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ Error: lost connection to coordinator

Location:
    /home/runner/work/dora/dora/binaries/daemon/src/lib.rs:247:29

open new terminal

(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora check
Dora Coordinator: ok
Dora Daemon: not running

Environment check failed.
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora list
Running dataflows:
- [webcam] e8125719-4732-4ad2-a732-99bf6e57fc4c
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora stop e8125719-4732-4ad2-a732-99bf6e57fc4c
no daemon connection
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ 

meua avatar Jun 06 '23 01:06 meua

Closing as i think this should have been fixed

haixuanTao avatar Aug 31 '24 04:08 haixuanTao