redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

kafka: oversized alloc in list_offsets_topic

Open IoannisRP opened this issue 1 year ago • 10 comments

Fixes: CORE-7778

  • Replaced std::vector instances with chunked_vector.
  • seastar::when_all_succeed is hardcoded to use a std::vector internally to receive/store the input range and a std::vector to return the result. As this was causing further "big allocations", an ssx::when_all_succeed utility is introduced that can accept any type of vector-like input range and can output to any type of vector-like container.

No tests where added in respect to oversized allocation memory warnings. Results where only confirmed locally.

Backports Required

  • [ ] none - not a bug fix
  • [ ] none - this is a backport
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [x] v24.2.x
  • [x] v24.1.x
  • [x] v23.3.x

Release Notes

  • none

IoannisRP avatar Oct 15 '24 15:10 IoannisRP

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56539#01929128-f2d3-453e-adf2-6e4f47a32ae3 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56649#019296d8-d29d-47a4-8d83-4453aece31f0 ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56803#01929f60-c15c-4ab6-b7a8-d371ef18565f ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56803#01929f60-c15a-4488-bb32-d283fdceadc9

vbotbuildovich avatar Oct 15 '24 19:10 vbotbuildovich

Not sure if it makes sense to fix upstream seastar first or at all? Fragmentation is real for seastar users in general so they are more or less open to changes that break up large allocations.

The main downside is in that project there are fewer chunked containers in the first place, though perhaps chunked_fifo is a fine-drop in here, not sure.

travisdowns avatar Oct 16 '24 13:10 travisdowns

FYI regarding commits, see: https://github.com/redpanda-data/redpanda/blob/dev/CONTRIBUTING.md

rockwotj avatar Oct 16 '24 18:10 rockwotj

FYI regarding commits, see: https://github.com/redpanda-data/redpanda/blob/dev/CONTRIBUTING.md

Are you referring to the follow up changes? I am going to rebase and squash these in the end.

Or is it something else?

IoannisRP avatar Oct 16 '24 18:10 IoannisRP

Just about the squashing and merging 👍

rockwotj avatar Oct 16 '24 18:10 rockwotj

@travisdowns

Not sure if it makes sense to fix upstream seastar first or at all? Fragmentation is real for seastar users in general so they are more or less open to changes that break up large allocations.

the idea was to solve it locally first and see at a later stage if it makes sense to upstream. I mainly wanted to make this configurable, instead of just hardcoding a chunked_vector exactly so we may potentially be able to upstream this or something similar.

IoannisRP avatar Oct 16 '24 18:10 IoannisRP

Added lifetime-management precondition comment for FutureRange.

IoannisRP avatar Oct 18 '24 09:10 IoannisRP

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56803#01929f60-c156-4208-b244-835b3a9fbe2b:

"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executed.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.prepared.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.executing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=None.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.preparing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executing.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.preparing.use_alias=True"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56803#01929f60-c15c-4ab6-b7a8-d371ef18565f:

"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executed.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.prepared.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.executed.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=None.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.preparing.use_alias=False"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56803#01929f60-c158-49f9-8178-ac73b78b405e:

"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.preparing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.prepared.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executed.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.prepared.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.executed.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=None.use_alias=False"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56803#01929f60-c15a-4488-bb32-d283fdceadc9:

"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.preparing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executing.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.preparing.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executed.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.prepared.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=None.use_alias=True"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56803#01929f64-44dc-40b6-9af9-08df621958c9:

"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executed.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.prepared.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.executed.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=None.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.preparing.use_alias=False"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56803#01929f64-44de-44bc-987d-89ddf95eafc4:

"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.preparing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.prepared.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executed.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.prepared.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.executed.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=None.use_alias=False"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56803#01929f64-44dd-42b3-83b3-14710ee0ffe0:

"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.prepared.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executed.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.executing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=None.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executing.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.preparing.use_alias=True"

non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56803#01929f64-44da-49fd-8619-8601f00bb61c:

"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executing.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.preparing.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=TmtpdiParams.cancellation=CancellationStage.dir=.out.stage=.preparing.use_alias=False"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.executed.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=CancellationStage.dir=.in.stage=.prepared.use_alias=True"
"rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=TmtpdiParams.cancellation=None.use_alias=True"

vbotbuildovich avatar Oct 18 '24 12:10 vbotbuildovich

Retry command for Build#56803

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","executed"],true],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","prepared"],true],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["out","executing"],false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[null,true],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["out","preparing"],false],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","executing"],true],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","preparing"],true],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","executing"],false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","preparing"],false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["out","prepared"],false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","executed"],false],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","prepared"],false],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["out","executed"],false],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[null,false],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","executed"],false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","prepared"],false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["out","executed"],false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[null,false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","executing"],false],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","preparing"],false],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["out","prepared"],false],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","executing"],true],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","preparing"],true],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["out","preparing"],false],"transfer_leadership":false}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","executed"],true],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[["in","prepared"],true],"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":[null,true],"transfer_leadership":true}

vbotbuildovich avatar Oct 18 '24 12:10 vbotbuildovich

CI failures:

IoannisRP avatar Oct 18 '24 13:10 IoannisRP

/backport v24.2.x

vbotbuildovich avatar Oct 21 '24 13:10 vbotbuildovich

/backport v24.1.x

vbotbuildovich avatar Oct 21 '24 13:10 vbotbuildovich

/backport v23.3.x

vbotbuildovich avatar Oct 21 '24 13:10 vbotbuildovich

Failed to create a backport PR to v24.1.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-23792-v24.1.x-994 remotes/upstream/v24.1.x
git cherry-pick -x 32aa4fbf00 efde79aca3

Workflow run logs.

vbotbuildovich avatar Oct 21 '24 13:10 vbotbuildovich

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-23792-v23.3.x-23 remotes/upstream/v23.3.x
git cherry-pick -x 32aa4fbf00 efde79aca3

Workflow run logs.

vbotbuildovich avatar Oct 21 '24 13:10 vbotbuildovich

Failed to create a backport PR to v24.2.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-23792-v24.2.x-148 remotes/upstream/v24.2.x
git cherry-pick -x 32aa4fbf00 efde79aca3

Workflow run logs.

vbotbuildovich avatar Oct 21 '24 13:10 vbotbuildovich