GraphScope runtime support failover

runtime support failover

Open tianliplus opened this issue 2 years ago • 2 comments

We noticed that the quering service is abnormal after pod failure and restart.

[x] Support failover for executor
[ ] Add CI tests for pegasus failover
[ ] Add CI tests for frontend failover
[ ] Add CI tests for ingestor failover
[ ] Add CI tests for coordinator failover

Jun 13 '22 07:06 tianliplus

after some store node restart (full restart is OK), new queries cannot completed.

Jun 13 '22 07:06 tianliplus

Update: resolved by #1950

Kill some worker and let it restart. Some worker will not found its new peer.

2022-07-20 15:26:41.560437789 ERROR (/home/graphscope/gs/research/engine/pegasus/network/src/send/mod.rs:254) [net-sender-2] fail to send data to 2, caused by No route to host (os error 113);

Jul 20 '22 07:07 siyuan0322

Resolved by add dns refreshing in grpc connections.

Apr 13 '23 02:04 siyuan0322

GraphScope GraphScope copied to clipboard

runtime support failover

GraphScope
GraphScope copied to clipboard