dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

ci: display backtrace from core dumps

Open P403n1x87 opened this issue 11 months ago • 2 comments

We make it easier to debug segmentation faults and other crashes in CI that can generate a core dump. We rely on gdb to display the complete backtraces available from any generated core dumps during test runs in Circle CI.

Checklist

  • [ ] Change(s) are motivated and described in the PR description
  • [ ] Testing strategy is described if automated tests are not included in the PR
  • [ ] Risks are described (performance impact, potential for breakage, maintainability)
  • [ ] Change is maintainable (easy to change, telemetry, documentation)
  • [ ] Library release note guidelines are followed or label changelog/no-changelog is set
  • [ ] Documentation is included (in-code, generated user docs, public corp docs)
  • [ ] Backport labels are set (if applicable)
  • [ ] If this PR changes the public interface, I've notified @DataDog/apm-tees.
  • [ ] If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.

Reviewer Checklist

  • [ ] Title is accurate
  • [ ] All changes are related to the pull request's stated goal
  • [ ] Description motivates each change
  • [ ] Avoids breaking API changes
  • [ ] Testing strategy adequately addresses listed risks
  • [ ] Change is maintainable (easy to change, telemetry, documentation)
  • [ ] Release note makes sense to a user of the library
  • [ ] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • [ ] Backport labels are set in a manner that is consistent with the release branch maintenance policy

P403n1x87 avatar Mar 12 '24 20:03 P403n1x87

Quick question: does ulimit -c work across our CI runners? It can be tricky to guarantee the generation of corefiles (indeed, that they even land in a consistent place). If you feel pretty good about this, please ignore the next part--I'm trying to be helpful, but I am ignorant.

I have an alternative workflow I've been using lately with some success. It has several disadvantages over corefile analysis--it doesn't use a debugger, so you can't interactively inspect anything. I also think its backtrace capabilities are a little less powerful than gdb. It does have a few nice features though

  • Prints the backtrace, mappings, and registers to stderr
  • Can be instrumented to trigger not just no SIGSEGV, but also SIGABRT (nice for catching those nasty libc errors like double-free)
  • Is straightforward to translate to any glibc-based system (for customer incidences)
  • Prints immediately, without having to wait for other forks or children to terminate (usually not a problem in CI)

I describe this workflow in a recent issue. It's annoying because the glibc maintainers have stopped providing this utility, but I have a suggestion for how to get it from old .deb archives. No pressure to use it or anything--I love GDB and I think the proposal will work great barring anything unlucky with how the CI runners work. Just throwing out an idea for consideration.

sanchda avatar Mar 13 '24 01:03 sanchda

Quick question: does ulimit -c work across our CI runners? It can be tricky to guarantee the generation of corefiles (indeed, that they even land in a consistent place). If you feel pretty good about this, please ignore the next part--I'm trying to be helpful, but I am ignorant.

Yep I have already tested this in #7659 (in fact this is where the code comes from). More on this from the Circle CI docs.

P403n1x87 avatar Mar 13 '24 12:03 P403n1x87