brpc icon indicating copy to clipboard operation
brpc copied to clipboard

Fix span lifecycle with smart pointers to prevent use-after-free in a…

Open lh2debug opened this issue 1 month ago • 18 comments

What problem does this PR solve?

Issue Number: resolve #3068

Problem Summary:

Span lifecycle management defect in distributed storage system:

Problem Scenario:

  • Server maintains a parent span for client Write requests
  • Two child spans are created for append_entries RPCs to followers
  • When the first follower responds, the parent span is prematurely destroyed along with all child spans
  • When the second follower responds, it attempts to access the already-freed child span, causing use-after-free

Root Cause: a. Premature deallocation: Parent span destroyed while child spans still in use b. Dangling pointer: Response callback accesses freed span objects

Reference counting relationship

  • Parent span holds strong references to child spans, while child spans hold weak references to parent span.
  • RPC Done holds a strong reference to the parent span (i.e., server span), ensuring the reference count is released only after trace recording is completed when RPC Done executes.
  • SpanContainer holds a strong reference to the parent span, ensuring the reference count is released only after the background collector thread completes dumping the trace to the database.
  • Controller holds weak references to parent span/child spans, which does not affect the lifecycle of related spans while avoiding access to dangling pointers.
graph LR
    subgraph "Parent Span (WriteChunk)"
        PS["Parent Span<br/>ref_count = 2<br/>trace_id: 12345<br/>span_id: 67890"]
    end
    
    subgraph "强引用持有者 (shared_ptr)"
        RD["RPC Done Callback<br/>SendRpcResponse<br/>shared_ptr<Span>"]
        SC["SpanContainer<br/>后台collector<br/>shared_ptr<Span>"]
    end
    
    subgraph "弱引用持有者 (weak_ptr)"
        PC["Controller<br/>_span<br/>weak_ptr<Span>"]
        CS1["Child Span 1<br/>_local_parent<br/>weak_ptr<Span>"]
        CS2["Child Span 2<br/>_local_parent<br/>weak_ptr<Span>"]
    end
    
    subgraph "Child Spans"
        CS1_DETAIL["Child Span 1<br/>ref_count = 1<br/>append_entries to Follower1<br/>trace_id: 12345<br/>span_id: 67891"]
        CS2_DETAIL["Child Span 2<br/>ref_count = 1<br/>append_entries to Follower2<br/>trace_id: 12345<br/>span_id: 67892"]
    end
    
    %% Parent Span的强引用
    RD -->|"+1 ref_count"| PS
    SC -->|"+1 ref_count"| PS
    
    %% Parent Span的弱引用
    PC -.->|"不增加ref_count"| PS
    CS1 -.->|"不增加ref_count"| PS
    CS2 -.->|"不增加ref_count"| PS
    
    %% Parent Span持有Child Spans
    PS -->|"_client_list<br/>shared_ptr<br/>+1 ref_count"| CS1_DETAIL
    PS -->|"_client_list<br/>shared_ptr<br/>+1 ref_count"| CS2_DETAIL
    
    %% Child Spans的弱引用
    CS1_DETAIL -.->|"_local_parent<br/>weak_ptr"| PS
    CS2_DETAIL -.->|"_local_parent<br/>weak_ptr"| PS
    
    %% Controller对Child Spans的弱引用
    C1["Controller 1<br/>weak_ptr"] -.-> CS1_DETAIL
    C2["Controller 2<br/>weak_ptr"] -.-> CS2_DETAIL
    
    %% 样式
    classDef parentSpan fill:#e1f5fe,stroke:#01579b,stroke-width:4px
    classDef childSpan fill:#f3e5f5,stroke:#4a148c,stroke-width:3px
    classDef strongRef fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px
    classDef weakRef fill:#fff3e0,stroke:#e65100,stroke-width:2px
    
    class PS parentSpan
    class CS1_DETAIL,CS2_DETAIL childSpan
    class RD,SC strongRef
    class PC,C1,C2 weakRef

What is changed and the side effects?

Changed:

Side effects: NO

  • Performance effects: NO

  • Breaking backward compatibility: NO


Check List:

lh2debug avatar Nov 06 '25 03:11 lh2debug