libxlio icon indicating copy to clipboard operation
libxlio copied to clipboard

issue: 4050516 Fix TCP segment leak

Open tomerdbz opened this issue 9 months ago • 8 comments

Description

Resolve race condition where TCP segments leak during SYN-RCVD timeout handling, causing "still N tcp segs in use" warnings.

Root cause: When tcp_slowtmr() times out a connection stuck in SYN-RCVD state, it calls TCP_EVENT_ERR() which triggers handle_incoming_handshake_failure(). This function calls close() on the child connection, which attempts to send FIN and allocates new TCP segments. These segments were never cleaned up.

Fix: Call abort_connection() before close() in handle_incoming_handshake_failure(). The tcp_abort() -> tcp_abandon() path explicitly does NOT send RST for SYN-RCVD connections (matching Linux kernel behavior), and properly purges all segments before setting state to CLOSED. This prevents segment allocation during subsequent close() call.

Also enhanced destructor logging to show PCB state and queue pointers for better debugging of any remaining segment leaks.

What

Fix TCP segment leak in SYN flood.

Why

Solves 4050516.

How
  1. Identified that TCP_EVENT_ERR() callback triggered close() which allocated segments that were never freed
  2. Added abort_connection() call before close() to preemptively clean up the PCB and set state to CLOSED
  3. The tcp_abort() -> tcp_abandon() code path explicitly skips sending RST for SYN-RCVD connections (matching Linux kernel behavior) and properly purges all queued segments
  4. When close() runs afterward, PCB is already in CLOSED state, preventing any attempt to send FIN/RST or allocate segments

Change type

What kind of change does this PR introduce?

  • [X] Bugfix
  • [ ] Feature
  • [ ] Code style update
  • [ ] Refactoring (no functional changes, no api changes)
  • [ ] Build related changes
  • [ ] CI related changes
  • [ ] Documentation content changes
  • [ ] Tests
  • [ ] Other

Check list

  • [ ] Code follows the style de facto guidelines of this project
  • [ ] Comments have been inserted in hard to understand places
  • [ ] Documentation has been updated (if necessary)
  • [ ] Test has been added (if possible)

tomerdbz avatar Apr 09 '25 06:04 tomerdbz

/review

tomerdbz avatar Apr 10 '25 10:04 tomerdbz

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

RFC Compliance

Ensure the implementation correctly adheres to the RFC specification for handling connections in SYN_RCVD state, particularly the transition to CLOSED without sending FIN or RST packets.

if (get_tcp_state(pcb) == SYN_RCVD) {
    // according to the RFC, in case we get a SYN and no more data
    // we should just close w/o FIN or RST
    tcp_pcb_purge(pcb);
    set_tcp_state(pcb, CLOSED);
    return ERR_OK;
Resource Cleanup

Validate that tcp_pcb_purge(pcb) effectively cleans up resources associated with the PCB to prevent memory leaks or dangling references.

tcp_pcb_purge(pcb);

pr-review-bot-app[bot] avatar Apr 10 '25 10:04 pr-review-bot-app[bot]

to trigger the bug on any environment:

  1. compile the program below
  2. syn-flood it from a client (I choose scapy but there are many tools available)
  3. bug will trigger on vNext. after the patch you could see it's fixed :)
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>

int main() {
  int server_fd;
  struct sockaddr_in address;
  int opt = 1;
  int addrlen = sizeof(address);

  printf("TCP server cycling program starting...\n");

  // Create socket file descriptor
  if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
    perror("socket failed");
    exit(EXIT_FAILURE);
  }

  // Set socket options
  if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR | SO_REUSEPORT, &opt,
                 sizeof(opt))) {
    perror("setsockopt failed");
    close(server_fd);
    exit(EXIT_FAILURE);
  }

  // Setup server address structure
  memset(&address, 0, sizeof(address));
  address.sin_family = AF_INET;
  address.sin_addr.s_addr = INADDR_ANY;
  address.sin_port = htons(55400);

  // Bind socket to the port
  if (bind(server_fd, (struct sockaddr *)&address, sizeof(address)) < 0) {
    perror("bind failed");
    close(server_fd);
    exit(EXIT_FAILURE);
  }

  // Listen for connections
  if (listen(server_fd, 3) < 0) {
    perror("listen failed");
    close(server_fd);
    exit(EXIT_FAILURE);
  }

  while (1) {
    // Accept a connection
    int new_socket;
    struct sockaddr_in client_addr;
    socklen_t client_addrlen = sizeof(client_addr);
    if ((new_socket = accept(server_fd, (struct sockaddr *)&client_addr,
                             &client_addrlen)) < 0) {
      perror("accept failed");
      close(server_fd);
      exit(EXIT_FAILURE);
    }
    // Close the accepted connection
    close(new_socket);
  }

  // This point will never be reached in the current code
  close(server_fd);
  return 0;
}

tomerdbz avatar Apr 11 '25 09:04 tomerdbz

@pasis, please review

galnoam avatar May 05 '25 11:05 galnoam

@pasis , please review. If approved, is it relevant to VMA as well?

galnoam avatar Jun 29 '25 08:06 galnoam

@BasharRadya can you review?

galnoam avatar Nov 10 '25 11:11 galnoam

Greptile Summary

  • Fixes TCP segment leak during SYN flood by calling abort_connection() before close() in handle_incoming_handshake_failure()
  • The tcp_abandon() path skips RST for SYN_RCVD connections and purges all segments before setting state to CLOSED, preventing segment allocation during subsequent close()
  • Enhanced destructor logging adds PCB state and queue pointers for better debugging of segment leaks

Confidence Score: 5/5

  • This PR is safe to merge - it correctly fixes a resource leak without introducing new issues
  • The fix properly addresses the segment leak by calling abort_connection() before close(). The tcp_abandon() code path explicitly avoids sending RST for SYN_RCVD state (line 324 in tcp.c) and calls tcp_pcb_purge() to free all segments before setting state to CLOSED. Setting m_parent=nullptr before abort prevents recursive calls. Enhanced logging aids future debugging.
  • No files require special attention

Important Files Changed

Filename Overview
src/core/sock/sockinfo_tcp.cpp Fixes TCP segment leak in SYN-RCVD timeout by calling abort_connection() before close() to prevent FIN segment allocation

Sequence Diagram

sequenceDiagram
    participant Timer as "tcp_slowtmr()"
    participant PCB as "TCP PCB (SYN_RCVD)"
    participant ErrCB as "TCP_EVENT_ERR"
    participant Parent as "Parent Listen Socket"
    participant Child as "Child Connection"
    participant lwIP as "lwIP tcp_abandon()"
    
    Timer->>PCB: "Timeout detected"
    Timer->>ErrCB: "TCP_EVENT_ERR(ERR_TIMEOUT)"
    ErrCB->>Parent: "err_lwip_cb() -> handle_incoming_handshake_failure()"
    Parent->>Parent: "Lock parent, remove from m_syn_received"
    Parent->>Child: "Set m_parent = nullptr"
    Parent->>Child: "abort_connection()"
    Child->>lwIP: "tcp_abort() -> tcp_abandon()"
    lwIP->>lwIP: "Skip RST for SYN_RCVD state"
    lwIP->>lwIP: "tcp_pcb_remove() -> tcp_pcb_purge()"
    lwIP->>lwIP: "Free unsent/unacked segments"
    lwIP->>lwIP: "Set state = CLOSED"
    lwIP->>ErrCB: "TCP_EVENT_ERR(ERR_ABRT)"
    Note over ErrCB: "m_parent is nullptr, no recursive call"
    Parent->>Child: "close()"
    Note over Child: "State is CLOSED, no FIN sent"
    Child->>Child: "No segment allocation"

greptile-apps[bot] avatar Nov 18 '25 18:11 greptile-apps[bot]

bot:retest

tomerdbz avatar Nov 26 '25 08:11 tomerdbz