vpp icon indicating copy to clipboard operation
vpp copied to clipboard

[VPP-2077] IP fragmentation: running_fragment_id is not thread safe

Open vvalderrv opened this issue 10 months ago • 8 comments

Description

I am working on a test suite that will do IP fragmentation and reasembly, using a tunneling protocol (currently gtpu) over an interface with reduced MTU. I have seen "duplicate/overlapping fragments" (causing very poor NDR) when multiple workers are fragmenting for a single tunnel.

As I was able to restore good performance using [0], I believe the issue is at line 45, running_fragment_id shared among all workers. A better solution would be to divide u32 into intervals, so each worker keeps a thread-local variable to iterate over its own interval.

From CSIT point of view, we are not blocked, as we have few testbeds with hyperthreading off, so we can see good results in 1c tests there.

[0] https://gerrit.fd.io/r/c/vpp/+/38797/1/src/vnet/ip/ip_frag.c

Assignee

Unassigned

Reporter

Vratko Polak

Comments

  • vrpolak (Fri, 15 Nov 2024 09:45:11 +0000): The "duplicate/overlapping fragments" symptom still present [4] in occasional failed tests in rls2410.

[4] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2410-3n-alt/21/log.html.gz#s1-s1-s1-s3-s2-t2-k3-k7-k1-k1-k1-k8-k14-k1-k1-k1-k1

  • vrpolak (Thu, 25 Jul 2024 13:57:13 +0000): The test still sometimes fails [3] in rls2406.

[3] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-alt/50/log.html.gz#s1-s1-s1-s3-s2-t3-k2-k10-k14

  • vrpolak (Mon, 17 Jun 2024 12:12:11 +0000): The suite is still failing: [2].

[2] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2406-3n-icx/66/log.html.gz#s1-s1-s1-s3-s7-t1-k2-k10-k14

  • vrpolak (Fri, 19 May 2023 12:46:28 +0000): A short term fix proposed: [1].

[1] https://gerrit.fd.io/r/c/vpp/+/38797

Original issue: https://jira.fd.io/browse/VPP-2077

vvalderrv avatar Feb 02 '25 15:02 vvalderrv