nebula icon indicating copy to clipboard operation
nebula copied to clipboard

How to recover in writing timeout

Open lv-stupidboy opened this issue 1 year ago • 3 comments

Please check the FAQ documentation before raising an issue

Describe the bug (required)

Your Environments (required)

  • OS: uname -a
  • Compiler: g++ --version or clang++ --version
  • CPU: lscpu
  • Commit id (e.g. a3ffc7d8)

How To Reproduce(required)

Steps to reproduce the behavior: 版本号3.2.1版本,且space是3副本,机器配置104C 300+内存,cpu和内存未观测到存在瓶颈,4块800G SSD磁盘,数据量20G左右

  1. 通过flink-connector进行数据写入,batch size =100,写入一段时间后graph日志显示RPC超时: StorageClientBase-inl.h.ext: Request to ip:9779 time out : TTransportException: Timed out There some RPC errors: RPC failure in storageClient with without :: TTransportException: time out InsertVerticesExecutor failed, error E_PRC_FAILURE, part 1 InsertVerticesExecutor failed, error E_PRC_FAILURE, part 2 InsertVerticesExecutor failed, error E_PRC_FAILURE, part 3
  2. 查询对应的storage日志: RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10001168 RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10000230 RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10001245 RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10001037 RaftPart.cpp:1033 Replicating log timed out : replicateLogLatencyUs 10001223 .........
  3. 如上storage日志持续打印7个小时且未恢复正常,节点处于offline状态一直未恢复

Expected behavior 1、想请问下上述情况发生可能存在哪些原因 2、节点应该如何恢复 3、单个节点offline,再提交任务为何还是写入失败,其他2个副本均正常

Additional context

lv-stupidboy avatar Sep 04 '23 12:09 lv-stupidboy

数据量较多的时候,rocksdb发生了compaction leader write rocksdb的时候 wirte stall 了,你可以搜下rocksdb的log日志,Stalling关键字,阻塞了可执行线程,导致了rpc超时,同时会发生选举,等compation完成以后应该就没问题了

tangyuanzhang avatar Sep 06 '23 02:09 tangyuanzhang

或者可以调整下执行线程数,大于你hosts的leader应该也可以解决

tangyuanzhang avatar Sep 06 '23 02:09 tangyuanzhang

或者可以调整下执行线程数,大于你hosts的leader应该也可以解决

感谢答复,请问下大于hosts的leader这个怎么理解。 另外针对rockesdb的一些参数配置,有最佳实践或者一些建议吗,我们再生产环境中使用发现,存在的稳定性问题比较多

lv-stupidboy avatar Sep 06 '23 08:09 lv-stupidboy