dd-trace-py
dd-trace-py copied to clipboard
chore(gevent): run after-in-child hooks after reinit
Description
Threads created too early in an application that uses gevent end up not running when the gevent hub reinit is executed after fork in the child process. This change ensures that we trigger the after-in-child fork hooks after the call to gevent.hub.reinit to ensure that threads created at any time will run as expected after fork.
More details
In its implementation, gevent wraps around os.fork and calls gevent.hub.reinit to re-initialise the state of the child process after fork. The forksafe hooks that are registered by the library end up running after the call to the OS fork syscall, but before gevent.hub.reinit, which causes the threads to not work as expected in the child process. Other frameworks, like gunicorn make their own call to os.fork and then call gevent.hub.reinit in the child process to obtain the same effect. This poses the same problem as in plain gevent. With this change, we register an import hook on gevent.hub to detect the possibility that gevent.hub.reinit might be called, and we patch this function to re-run the after-in-child fork hook to ensure that they run after the gevent hub has been reinitialised fully in the child process after fork. In particular, this ensures that threads are recreated at the right time and can then work as expected in child processes.
Checklist
- [x] Title must conform to conventional commit.
- [x] Add additional sections for
featandfixpull requests. - [x] Ensure tests are passing for affected code.
- [x] Library documentation and/or Datadog's documentation site is updated. Link to doc PR in description.
Reviewer Checklist
- [ ] Title is accurate.
- [ ] Description motivates each change.
- [ ] No unnecessary changes were introduced in this PR.
- [ ] PR cannot be broken up into smaller PRs.
- [ ] Avoid breaking API changes unless absolutely necessary.
- [ ] Tests provided or description of manual testing performed is included in the code or PR.
- [ ] Release note has been added for fixes and features, or else
changelog/no-changeloglabel added. - [ ] All relevant GitHub issues are correctly linked.
- [ ] Backports are identified and tagged with Mergifyio.
- [ ] Add to milestone.
are we able to reproduce this behavior in a test?
I think yes, but not easily. I'll give it a try.
Based on some of the CI failures, this is likely to require #4049.
It seems that ModuleWatchdog conflicts with the testdir fixture on Python 2.
@brettlangdon I've added a test case.
#4083 is a pre-requisite for this PR.
@P403n1x87 do you have an idea of the side effect this has so far in the real world? I'm curious as we're detecting this issue a bit "late".
@P403n1x87 this pull request is now in conflict 😩
@P403n1x87 do you have an idea of the side effect this has so far in the real world? I'm curious as we're detecting this issue a bit "late".
The side-effect should be that we now properly restart threads in e.g. gunicorn worker processes. I have done some manual testing with a Flask application and a modified tracer that starts the writer thread immediately. I was able to see the writer thread running correctly in each process with this fix, whereas without it the thread would be running only in the master process.
@P403n1x87 this pull request is now in conflict 😩
Codecov Report
Merging #4070 (8721108) into 1.x (a940ef1) will decrease coverage by
0.02%. The diff coverage is27.27%.
@@ Coverage Diff @@
## 1.x #4070 +/- ##
==========================================
- Coverage 78.86% 78.83% -0.03%
==========================================
Files 720 720
Lines 57386 57440 +54
==========================================
+ Hits 45255 45285 +30
- Misses 12131 12155 +24
| Impacted Files | Coverage Δ | |
|---|---|---|
| tests/internal/test_module.py | 0.00% <0.00%> (ø) |
|
| tests/profiling/collector/test_threading.py | 0.00% <0.00%> (ø) |
|
| tests/profiling/collector/test_traceback.py | 0.00% <0.00%> (ø) |
|
| tests/tracer/test_forksafe.py | 69.76% <9.09%> (-11.01%) |
:arrow_down: |
| ddtrace/internal/forksafe.py | 83.58% <66.66%> (-2.63%) |
:arrow_down: |
| ddtrace/__init__.py | 100.00% <100.00%> (ø) |
|
| ddtrace/internal/nogevent.py | 65.21% <100.00%> (+0.77%) |
:arrow_up: |
| ddtrace/sampler.py | 96.15% <100.00%> (ø) |
|
| tests/contrib/httplib/test_httplib.py | 97.86% <100.00%> (+<0.01%) |
:arrow_up: |
| ddtrace/internal/module.py | 86.71% <0.00%> (+5.94%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more