Scheduler will lost the registered executor when restart it in the `push` mode
Describe the bug When i restart the schedule, the schedule lost all the information of registered executor
To Reproduce Start a scheduler with below config:
scheduler_policy="PushStaged"
Start a executor with below config:
scheduler_port=50050
scheduler_host="localhost"
# PushStaged or PullStaged
task_scheduling_policy="PushStaged"
then kill the scheduler and restart the scheduler using the same config.
And the scheduler will lost all registered executor in the memory.
Expected behavior We should recover this data in memory after the scheduler restart.
Solution: heartbeat with the registered information for the executor
Additional context Add any other context about the problem here.
@liukun4515 Are you running in standalone mode? It should initialize any registered executors from the backend if you are using etcd as the state backend but in standalone mode the persistent state is stored in sled DB on disk (in a temp file). If we wanted to make standalone mode persist state across restarts then we would need to make the sled DB location a configurable path.
Is it still an issue?
@liukun4515 @thinkharderdev @mingmwang This can be fixed by specifying the --sled-dir parameter when starting the scheduler service.