[VPP-2077] IP fragmentation: running_fragment_id is not thread safe
Description
I am working on a test suite that will do IP fragmentation and reasembly, using a tunneling protocol (currently gtpu) over an interface with reduced MTU. I have seen "duplicate/overlapping fragments" (causing very poor NDR) when multiple workers are fragmenting for a single tunnel.
As I was able to restore good performance using [0], I believe the issue is at line 45, running_fragment_id shared among all workers. A better solution would be to divide u32 into intervals, so each worker keeps a thread-local variable to iterate over its own interval.
From CSIT point of view, we are not blocked, as we have few testbeds with hyperthreading off, so we can see good results in 1c tests there.
[0] https://gerrit.fd.io/r/c/vpp/+/38797/1/src/vnet/ip/ip_frag.c
Assignee
Unassigned
Reporter
Vratko Polak
Comments
- vrpolak (Fri, 15 Nov 2024 09:45:11 +0000): The "duplicate/overlapping fragments" symptom still present [4] in occasional failed tests in rls2410.
- vrpolak (Thu, 25 Jul 2024 13:57:13 +0000): The test still sometimes fails [3] in rls2406.
- vrpolak (Mon, 17 Jun 2024 12:12:11 +0000): The suite is still failing: [2].
- vrpolak (Fri, 19 May 2023 12:46:28 +0000): A short term fix proposed: [1].
[1] https://gerrit.fd.io/r/c/vpp/+/38797
Original issue: https://jira.fd.io/browse/VPP-2077