drbd icon indicating copy to clipboard operation
drbd copied to clipboard

PRIMARY timeouted after secondary node rebooted and joined cluster(minutes after) - RDMA

Open Lathanderjk opened this issue 9 months ago • 0 comments

We are updating to 9.2.9, one of nodes was updated few days back, today after updating second one and rebooting, PRIMARY(which was still 9.2.8) timeouted for more than 60s(FS monitor timeout), it happened after secondary which was rebooted was joined...

PRIMARY system load during timeout decreased and IO busy/throughput on underlaing drives goes down to zero.

I marked time when timeout start in log. log.txt filesystem.res.txt

Lathanderjk avatar May 07 '24 08:05 Lathanderjk