dora
dora copied to clipboard
failed to stop dataflow
Describe the bug dora-daemon hangs up due to heartbeat timeout, but dora-coodinator is running normally, then I restart dora-daemon, when the dataflow is closed by dora stop uuid, it cannot be closed.
(dora3.7) jarvis@jia:~/coding/dora_home/dora$ conda activate py310
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli up
started dora coordinator
started dora daemon
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli -V
dora-cli 0.2.3-rc6
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli check
Dora Coordinator: ok
Dora Daemon: ok
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli start examples/python-operator-dataflow/dataflow.yml --attach --hot-reload
10af7c98-604d-4808-b48a-7e028cb3d733
2023-05-19T03:53:57.743423Z WARN dora_coordinator: daemon at `` did not react as expected to watchdog message
Caused by:
0: failed to send watchdog message to daemon
1: Broken pipe (os error 32)
Location:
/home/jarvis/coding/dora_home/dora/binaries/coordinator/src/lib.rs:550:10
at binaries/coordinator/src/lib.rs:468
open new terminal and kill dora-daemon, simulate the daemon process to hang up abnormally
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli list
Running dataflows:
- [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
(py310) jarvis@jia:~/coding/dora_home/dora$ ps -ef | grep dora
jarvis 22117 1 0 11:41 pts/12 00:00:00 dora-coordinator
jarvis 22131 1 0 11:41 pts/12 00:00:01 dora-daemon
jarvis 24461 18206 0 11:53 pts/12 00:00:00 dora-cli start dataflow.yml --attach --hot-reload
jarvis 24464 22131 7 11:53 pts/12 00:00:01 python3 -c import dora; dora.start_runtime() # webcam
jarvis 24467 22131 8 11:53 pts/12 00:00:01 python3 -c import dora; dora.start_runtime() # plot
jarvis 24598 22333 0 11:53 pts/3 00:00:00 grep --color=auto dora
(py310) jarvis@jia:~/coding/dora_home/dora$ kill -15 22131
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli list
Running dataflows:
- [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli stop
> Choose dataflow to stop: [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
no daemon connection
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli up
started dora daemon
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli check
Dora Coordinator: ok
Dora Daemon: ok
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli list
Running dataflows:
- [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli stop
> Choose dataflow to stop: [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
failed to stop dataflow
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli list
Running dataflows:
- [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli -V
dora-cli 0.2.3-rc6
(py310) jarvis@jia:~/coding/dora_home/dora$
To Reproduce Steps to reproduce the behavior:
- Dora start coodinator and daemon:
dora-cli up
- Start a new dataflow:
dora-cli start examples/python-operator-dataflow/dataflow.yaml --attach --hot-reload
- Kill dora-daemon:
kill -15 pid_dora_daemon
- Dora start daemon:
dora-cli up
- Destroy dataflow:
dora-cli stop uuid_your_dataflow
Expected behavior I expect dora-coodinator and dora-daemon to live and die together, and they can automatically restart when the heartbeat times out, Or dora-daemon hangs up, and dataflow is also destroyed.
Environments (please complete the following information):
- System info: ubuntu 22.04
- Dora version: v0.2.3-rc6
Can I ask why are you killing the daemon?
We do not support auto-restarting daemon at the moment.
Can I ask why are you killing the daemon?
We do not support auto-restarting daemon at the moment.
Because, there are some reasons due to custom nodes and operators, which will cause dora-daemon to hang innocently. I kill the dora-daemon process to simulate this situation.
Do you have any ideas or context you can share about why dora-daemon to hang innocently?
Do you have any ideas or context you can share about why dora-daemon to hang innocently?
I am not running in source debug mode,after dora up
, run RUST_LOG=true dora start graphs/tutorials/webcam.yaml --attach --hot-reload --name webcam
, dataflow cannot be stopped
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora list
Running dataflows:
- [webcam] 2eeba0b6-4cfa-438a-bc7f-0747664e06f3
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora stop
> Choose dataflow to stop: [webcam] 2eeba0b6-4cfa-438a-bc7f-0747664e06f3
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora list
Running dataflows:
- [webcam] 2eeba0b6-4cfa-438a-bc7f-0747664e06f3
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora -V
dora-cli 0.2.3
(dora3.7) jarvis@jia:~/coding/pyhome/github.com/dora-rs/dora-drives$ dora logs 2eeba0b6-4cfa-438a-bc7f-0747664e06f3 webcam
> │ Logs from webcam.
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ could not get webcam.
2 │ could not get webcam.
3 │ could not get webcam.
4 │ could not get webcam.
5 │ could not get webcam.
6 │ could not get webcam.
7 │ could not get webcam.
8 │ could not get webcam.
9 │ could not get webcam.
10 │ could not get webcam.
11 │ could not get webcam.
12 │ could not get webcam.
13 │ could not get webcam.
14 │ could not get webcam.
This should have been fixed with grace duration