redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

Improve tls truststore search

Open michael-redpanda opened this issue 1 month ago • 12 comments

This change will first check to see if the default trust store file that OpenSSL uses exists before attempting to load one via the set_x509_trust_file API. If it does, call set_system_trust rather than loading a system trust file via the set_x509_trust_file API.

This pattern should avoid occurrences of oversized allocations which happen when reading and loading a sufficiently large trust store.

Todo:

  • [x] Run in CDT

Backports Required

  • [ ] none - not a bug fix
  • [ ] none - this is a backport
  • [ ] none - issue does not exist in previous branches
  • [ ] none - papercut/not impactful enough to backport
  • [X] v25.3.x
  • [X] v25.2.x
  • [X] v25.1.x
  • [ ] v24.3.x

Release Notes

Improvements

  • Reduce possibility of oversized allocs from occurring when loading the system trust store file

michael-redpanda avatar Nov 21 '25 17:11 michael-redpanda

CI test results

test results on build#76810
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
TieredStorageTest test_tiered_storage {"cloud_storage_type_and_url_style": [1, "virtual_host"], "test_case": {"name": "(TS_Read == True, TS_Timequery == True, SpilloverManifestUploaded == True)"}} integration https://buildkite.com/redpanda/redpanda/builds/76810#019aa7cd-688d-4671-bd00-d8448981f02c FLAKY 19/21 upstream reliability is '99.33184855233853'. current run reliability is '90.47619047619048'. drift is 8.85566 and the allowed drift is set to 50. The test should PASS https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=TieredStorageTest&test_method=test_tiered_storage
src/v/storage/tests/segment_appender_rpbench_test src/v/storage/tests/segment_appender_rpbench_test unit https://buildkite.com/redpanda/redpanda/builds/76810#019aa782-8252-4966-92fb-8979f2b127d9 FAIL 0/1
test results on build#77530
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
ScalingUpTest test_fast_node_addition null integration https://buildkite.com/redpanda/redpanda/builds/77530#019aff3e-403d-425d-af1f-acc26f2bf935 FLAKY 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0276, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.0805, p1=0.4320, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ScalingUpTest&test_method=test_fast_node_addition
test results on build#77577
test_class test_method test_arguments test_kind job_url test_status passed reason test_history
PartitionReassignmentsTest test_reassignments_cancel null integration https://buildkite.com/redpanda/redpanda/builds/77577#019b03ca-85f0-4158-affa-070eb05bd58c FLAKY 22/31 Test PASSES after retries.No significant increase in flaky rate(baseline=0.1146, p0=0.0171, reject_threshold=0.0100. adj_baseline=0.3059, p1=0.4038, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionReassignmentsTest&test_method=test_reassignments_cancel

vbotbuildovich avatar Nov 21 '25 20:11 vbotbuildovich

/cdt

michael-redpanda avatar Nov 21 '25 20:11 michael-redpanda

This run of CDT did not appear to generate any oversized allocations

michael-redpanda avatar Nov 24 '25 11:11 michael-redpanda

Force push:

  • Adjusted logic to be more inline with previous behavior

michael-redpanda avatar Nov 25 '25 17:11 michael-redpanda

Hoping we can get this in pretty soon we are seeing these fialures in CI.

dotnwat avatar Dec 06 '25 21:12 dotnwat

@michael-redpanda merge conflict with openssl

dotnwat avatar Dec 08 '25 16:12 dotnwat

Force push:

  • Rebased

michael-redpanda avatar Dec 08 '25 16:12 michael-redpanda

Force push:

  • Fixed include path

michael-redpanda avatar Dec 08 '25 17:12 michael-redpanda

Force push:

  • Fixed wrong bazel dependency

michael-redpanda avatar Dec 08 '25 17:12 michael-redpanda

/cdt

michael-redpanda avatar Dec 08 '25 20:12 michael-redpanda

Saw the oversized alloc in the last CDT run - so not 100% addressed (which I didn't htink this PR would do just attempt to lessen its frequency).

michael-redpanda avatar Dec 09 '25 15:12 michael-redpanda

Force push:

  • Expand to check if found CA file is within OpenSSL's search directory

michael-redpanda avatar Dec 09 '25 15:12 michael-redpanda