dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

chore(gevent): run after-in-child hooks after reinit

Open P403n1x87 opened this issue 3 years ago • 5 comments

Description

Threads created too early in an application that uses gevent end up not running when the gevent hub reinit is executed after fork in the child process. This change ensures that we trigger the after-in-child fork hooks after the call to gevent.hub.reinit to ensure that threads created at any time will run as expected after fork.

More details

In its implementation, gevent wraps around os.fork and calls gevent.hub.reinit to re-initialise the state of the child process after fork. The forksafe hooks that are registered by the library end up running after the call to the OS fork syscall, but before gevent.hub.reinit, which causes the threads to not work as expected in the child process. Other frameworks, like gunicorn make their own call to os.fork and then call gevent.hub.reinit in the child process to obtain the same effect. This poses the same problem as in plain gevent. With this change, we register an import hook on gevent.hub to detect the possibility that gevent.hub.reinit might be called, and we patch this function to re-run the after-in-child fork hook to ensure that they run after the gevent hub has been reinitialised fully in the child process after fork. In particular, this ensures that threads are recreated at the right time and can then work as expected in child processes.

Checklist

Reviewer Checklist

  • [ ] Title is accurate.
  • [ ] Description motivates each change.
  • [ ] No unnecessary changes were introduced in this PR.
  • [ ] PR cannot be broken up into smaller PRs.
  • [ ] Avoid breaking API changes unless absolutely necessary.
  • [ ] Tests provided or description of manual testing performed is included in the code or PR.
  • [ ] Release note has been added for fixes and features, or else changelog/no-changelog label added.
  • [ ] All relevant GitHub issues are correctly linked.
  • [ ] Backports are identified and tagged with Mergifyio.
  • [ ] Add to milestone.

P403n1x87 avatar Aug 10 '22 10:08 P403n1x87

are we able to reproduce this behavior in a test?

I think yes, but not easily. I'll give it a try.

P403n1x87 avatar Aug 10 '22 12:08 P403n1x87

Based on some of the CI failures, this is likely to require #4049.

P403n1x87 avatar Aug 10 '22 13:08 P403n1x87

It seems that ModuleWatchdog conflicts with the testdir fixture on Python 2.

P403n1x87 avatar Aug 11 '22 11:08 P403n1x87

@brettlangdon I've added a test case.

P403n1x87 avatar Aug 12 '22 13:08 P403n1x87

#4083 is a pre-requisite for this PR.

P403n1x87 avatar Aug 13 '22 09:08 P403n1x87

@P403n1x87 do you have an idea of the side effect this has so far in the real world? I'm curious as we're detecting this issue a bit "late".

jd avatar Aug 16 '22 15:08 jd

@P403n1x87 this pull request is now in conflict 😩

mergify[bot] avatar Aug 16 '22 16:08 mergify[bot]

@P403n1x87 do you have an idea of the side effect this has so far in the real world? I'm curious as we're detecting this issue a bit "late".

The side-effect should be that we now properly restart threads in e.g. gunicorn worker processes. I have done some manual testing with a Flask application and a modified tracer that starts the writer thread immediately. I was able to see the writer thread running correctly in each process with this fix, whereas without it the thread would be running only in the master process.

P403n1x87 avatar Aug 17 '22 09:08 P403n1x87

@P403n1x87 this pull request is now in conflict 😩

mergify[bot] avatar Aug 18 '22 14:08 mergify[bot]

Codecov Report

Merging #4070 (8721108) into 1.x (a940ef1) will decrease coverage by 0.02%. The diff coverage is 27.27%.

@@            Coverage Diff             @@
##              1.x    #4070      +/-   ##
==========================================
- Coverage   78.86%   78.83%   -0.03%     
==========================================
  Files         720      720              
  Lines       57386    57440      +54     
==========================================
+ Hits        45255    45285      +30     
- Misses      12131    12155      +24     
Impacted Files Coverage Δ
tests/internal/test_module.py 0.00% <0.00%> (ø)
tests/profiling/collector/test_threading.py 0.00% <0.00%> (ø)
tests/profiling/collector/test_traceback.py 0.00% <0.00%> (ø)
tests/tracer/test_forksafe.py 69.76% <9.09%> (-11.01%) :arrow_down:
ddtrace/internal/forksafe.py 83.58% <66.66%> (-2.63%) :arrow_down:
ddtrace/__init__.py 100.00% <100.00%> (ø)
ddtrace/internal/nogevent.py 65.21% <100.00%> (+0.77%) :arrow_up:
ddtrace/sampler.py 96.15% <100.00%> (ø)
tests/contrib/httplib/test_httplib.py 97.86% <100.00%> (+<0.01%) :arrow_up:
ddtrace/internal/module.py 86.71% <0.00%> (+5.94%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Aug 18 '22 16:08 codecov-commenter