redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Improved Raft replicate backpressure handling

Open mmaslankaprv opened this issue 1 year ago • 12 comments

Buffered protocol is a wrapper around raft::consensus_client_protocol tha the main purpose is to control the flow of requests from all the raft groups instantiated on one shard to the same remote node. Instead of tracking the requests separately per each raft group the buffered protocol tracks and buffer all of them targeting the same node. The size of the buffer and number of inflight append entries requests is tunable with the cluster configuration properties.

Using a single buffer per shard for the same follower allows to better amortize spikes in the follower disk latency without propagating it to the end user.

Backports Required

  • [ ] none - not a bug fix
  • [ ] none - this is a backport
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [ ] v24.2.x
  • [ ] v24.1.x
  • [ ] v23.3.x

Release Notes

  • none

mmaslankaprv avatar Jul 29 '24 17:07 mmaslankaprv

/dt

mmaslankaprv avatar Jul 29 '24 17:07 mmaslankaprv

/dt

mmaslankaprv avatar Jul 30 '24 07:07 mmaslankaprv

/dt

mmaslankaprv avatar Jul 31 '24 07:07 mmaslankaprv

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52305#0191081d-fe9d-4db8-ba49-81ec77ade1a1

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52305#0191081d-fe9f-4e6e-ace7-c2b2e89d30aa

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52305#0191081f-3806-4b88-833e-e32110a1ec06

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53125#01916a19-c8cb-4af8-8ba0-f0abfbbdb610

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53684#019199b8-19ac-4ad9-bc02-12d9c48c5f55

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54512#0191fa28-c86b-4dc6-a068-c3c89df41115

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56128#0192723b-85a3-4eeb-9c1c-78e802b0aa0e ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56331#01927af5-06e9-4590-a4e5-d77f5af109e1 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56331#01927af7-8834-4c59-aa08-b262a15ef614 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/58181#01933f1c-a7fc-439d-a72c-239323cd4c02

vbotbuildovich avatar Jul 31 '24 10:07 vbotbuildovich

Appears this may have a bisection failure

FAILED aa75503435b4bf0575898bbec3f822df96b5db99 from https://github.com/redpanda-data/redpanda/pull/22632

dotnwat avatar Aug 03 '24 16:08 dotnwat

/cdt

mmaslankaprv avatar Aug 29 '24 07:08 mmaslankaprv

/cdt

mmaslankaprv avatar Sep 18 '24 16:09 mmaslankaprv

/cdt

mmaslankaprv avatar Oct 10 '24 07:10 mmaslankaprv

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56331#01927af7-883a-4033-a6ad-edb7a5205f06:

"rptest.tests.e2e_iam_role_test.STSRoleFetchTests.test_write"

vbotbuildovich avatar Oct 11 '24 11:10 vbotbuildovich

Retry command for Build#56331

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_storage_timing_stress_test.py::CloudStorageTimingStressTest.test_cloud_storage@{"cleanup_policy":"delete"}

vbotbuildovich avatar Oct 11 '24 11:10 vbotbuildovich

/cdt

mmaslankaprv avatar Oct 11 '24 11:10 mmaslankaprv

/cdt

mmaslankaprv avatar Oct 15 '24 12:10 mmaslankaprv

the below tests from https://buildkite.com/redpanda/redpanda/builds/57410#0192e196-2c78-47ae-a816-d2dca875378f have failed and will be retried

catalog_schema_manager_rpunit

the below tests from https://buildkite.com/redpanda/redpanda/builds/59008#019376bc-1ccd-4fdb-8bc5-891398a494ea have failed and will be retried

gtest_raft_rpunit

vbotbuildovich avatar Oct 31 '24 10:10 vbotbuildovich