dd-trace-php icon indicating copy to clipboard operation
dd-trace-php copied to clipboard

fix(prof): fewer borrows, less panics on borrows

Open realFlowControl opened this issue 9 months ago • 1 comments

Description

We found a panic_already_borrowed in our RelEnv with I/O profiling. The following is happening:

  • the profiler takes a sample
  • during this, a signal arrives (PHP timed out)
  • the timeout handler want’s to write it’s usual “Fatal error: Maximum execution time ..” error
  • this calls into the I/O profiling write hook which decides to take a sample :boom:

So far we've assumed that we do not sample while we sample, turns out the signal handler for timeout can bypass our assumptions and we may actually sample while we sample 😬

There is no reason to hold onto the mutable borrow for those RefCell's while we collect the stack trace, so this PR aims to only hold onto the mutable borrow for those RefCell's while we change the stats data and make the sampling decision. After that we release the borrow before calling into stack trace collection.

There is still the case where we might hold a mutable borrow on the CACHED_STRING:

https://github.com/DataDog/dd-trace-php/blob/54d1518e01642fee78f9f52e20de6681a8f128ee/profiling/src/profiling/stack_walking.rs#L242

This is not solved by this PR and we can do this in another one. The main question do be decided on: Do we try and collect the frame without the cache or do we just skip a frame if we can't acquire a mutable borrow?

Reviewer checklist

  • [ ] Test coverage seems ok.
  • [ ] Appropriate labels assigned.

realFlowControl avatar Jun 12 '25 19:06 realFlowControl

Benchmarks [ profiler ]

Benchmark execution time: 2025-07-15 11:06:37

Comparing candidate commit cbd8aa1d7687e898411242d96a8adcf868412469 in PR branch florian/a-little-less-borrow with baseline commit 843c1f636ddffe62bb508debc73d1b69e9d257b6 in branch master.

Found 4 performance improvements and 0 performance regressions! Performance is the same for 25 metrics, 7 unstable metrics.

scenario:php-profiler-exceptions-with-profiler

  • 🟩 execution_time [-5.037ms; -2.997ms] or [-5.651%; -3.362%]

scenario:php-profiler-exceptions-with-profiler-and-timeline

  • 🟩 cpu_user_time [-9.286ms; -4.641ms] or [-11.773%; -5.884%]
  • 🟩 execution_time [-3.929ms; -2.739ms] or [-4.405%; -3.071%]

scenario:walk_stack/1

  • 🟩 wall_time [-279.045ns; -273.146ns] or [-2.188%; -2.142%]

pr-commenter[bot] avatar Jun 12 '25 19:06 pr-commenter[bot]