GraphScope icon indicating copy to clipboard operation
GraphScope copied to clipboard

runtime support failover

Open tianliplus opened this issue 2 years ago • 2 comments

We noticed that the quering service is abnormal after pod failure and restart.

  • [x] Support failover for executor
  • [ ] Add CI tests for pegasus failover
  • [ ] Add CI tests for frontend failover
  • [ ] Add CI tests for ingestor failover
  • [ ] Add CI tests for coordinator failover

tianliplus avatar Jun 13 '22 07:06 tianliplus

after some store node restart (full restart is OK), new queries cannot completed.

tianliplus avatar Jun 13 '22 07:06 tianliplus

Update: resolved by #1950


Kill some worker and let it restart. Some worker will not found its new peer.

2022-07-20 15:26:41.560437789 ERROR (/home/graphscope/gs/research/engine/pegasus/network/src/send/mod.rs:254) [net-sender-2] fail to send data to 2, caused by No route to host (os error 113);

siyuan0322 avatar Jul 20 '22 07:07 siyuan0322

Resolved by add dns refreshing in grpc connections.

siyuan0322 avatar Apr 13 '23 02:04 siyuan0322