GraphScope
GraphScope copied to clipboard
runtime support failover
We noticed that the quering service is abnormal after pod failure and restart.
- [x] Support failover for executor
- [ ] Add CI tests for pegasus failover
- [ ] Add CI tests for frontend failover
- [ ] Add CI tests for ingestor failover
- [ ] Add CI tests for coordinator failover
after some store node restart (full restart is OK), new queries cannot completed.
Update: resolved by #1950
Kill some worker and let it restart. Some worker will not found its new peer.
2022-07-20 15:26:41.560437789 ERROR (/home/graphscope/gs/research/engine/pegasus/network/src/send/mod.rs:254) [net-sender-2] fail to send data to 2, caused by No route to host (os error 113);
Resolved by add dns refreshing in grpc connections.