scout-apm-php
scout-apm-php copied to clipboard
Edge case where process doesn't end until core-agent is killed
I've uncovered a weird edge case using tooling that results in the PHP process not ending until core-agent is killed.
Currently, reliable reproduction can be done with:
- xdebug extension enabled
composer require --dev infection/infection- The following
infection.json.dist:
{
"timeout": 10,
"source": {
"directories": [
"src"
]
},
"logs": {
"text": "infection.log"
},
"mutators": {
"@default": true
}
}
Running Infection works all fine (e.g. vendor/bin/infection --threads=8 --log-verbosity=none), except after printing the output it just hangs. Running ps ax | grep core-agent shows multiple core agents running, and trace-level debug doesn't show anything untoward:
[2020-01-07T15:05:17][core_agent][INFO] Initializing logger with log level: Trace
[2020-01-07T15:05:17][core_agent][DEBUG] Launching build: 1.2.7, d3f445a, built on 2019-12-12T20:00:43Z
[2020-01-07T15:05:17][probe][DEBUG] Detecting running version via CoreAgentProbe
[2020-01-07T15:05:17][probe][DEBUG] CoreAgentProbe found version: "1.2.7"
[2020-01-07T15:05:17][core_agent][INFO] Initializing logger with log level: Trace
[2020-01-07T15:05:17][core_agent][DEBUG] Launching build: 1.2.7, d3f445a, built on 2019-12-12T20:00:43Z
[2020-01-07T15:05:17][probe][DEBUG] Detecting running version via CoreAgentProbe
[2020-01-07T15:05:17][probe][DEBUG] CoreAgentProbe found version: "1.2.7"
[2020-01-07T15:05:17][core_agent][INFO] Initializing logger with log level: Trace
[2020-01-07T15:05:17][core_agent][DEBUG] Launching build: 1.2.7, d3f445a, built on 2019-12-12T20:00:43Z
[2020-01-07T15:05:17][probe][DEBUG] Detecting running version via CoreAgentProbe
[2020-01-07T15:05:17][probe][DEBUG] CoreAgentProbe found version: "1.2.7"
[2020-01-07T15:05:17][core_agent][INFO] Initializing logger with log level: Trace
[2020-01-07T15:05:17][core_agent][DEBUG] Launching build: 1.2.7, d3f445a, built on 2019-12-12T20:00:43Z
[2020-01-07T15:05:17][probe][DEBUG] Detecting running version via CoreAgentProbe
[2020-01-07T15:05:17][probe][DEBUG] CoreAgentProbe found version: "1.2.7"
[2020-01-07T15:05:17][core_agent][INFO] Initializing logger with log level: Trace
[2020-01-07T15:05:17][core_agent][DEBUG] Launching build: 1.2.7, d3f445a, built on 2019-12-12T20:00:43Z
[2020-01-07T15:05:17][probe][DEBUG] Detecting running version via CoreAgentProbe
[2020-01-07T15:05:17][probe][DEBUG] CoreAgentProbe found version: "1.2.7"
[2020-01-07T15:05:17][core_agent][INFO] Initializing logger with log level: Trace
[2020-01-07T15:05:17][core_agent][DEBUG] Launching build: 1.2.7, d3f445a, built on 2019-12-12T20:00:43Z
[2020-01-07T15:05:17][probe][DEBUG] Detecting running version via CoreAgentProbe
[2020-01-07T15:05:17][probe][DEBUG] CoreAgentProbe found version: "1.2.7"
[2020-01-07T15:05:17][core_agent][INFO] Initializing logger with log level: Trace
[2020-01-07T15:05:17][core_agent][DEBUG] Launching build: 1.2.7, d3f445a, built on 2019-12-12T20:00:43Z
[2020-01-07T15:05:17][probe][DEBUG] Detecting running version via CoreAgentProbe
[2020-01-07T15:05:17][probe][DEBUG] CoreAgentProbe found version: "1.2.7"
If I disable xdebug (sudo phpdismod xdebug) and re-run with vendor/bin/infection --threads=8 --log-verbosity=none --initial-tests-php-options="-d zend_extension=xdebug.so" it actually works.
I've seen some weird interaction between xdebug and the core-agent before, but dismissed it as I figured it was just some race condition, but this is a reliable reproduction case.
It's unclear exactly what interaction is causing this behaviour, and it's also unclear as to whether this would cause problems in the real world. If I discover any more cases again, I'll document them here and hopefully some light can be shed on this.
Further information: my first thought was this note on on the exec() function documentation:
If a program is started with this function, in order for it to continue running in the background, the output of the program must be redirected to a file or another output stream. Failing to do so will cause PHP to hang until the execution of the program ends.
I've tried appending >> /dev/null 2>&1 & to the command to be run, which didn't appear to affect anything. Maybe I didn't do something right, or maybe I misunderstood the meaning of the docs there.
We launch CA with the daemonize flag right? That should do all the work of closing STDIN/OUT/ERR on the core agent side.
There was an issue with file handles being passed through to the core agent accidentally (they persist on fork unless the caller of the c-function fork says otherwise), which caused similar issues on old Python versions, but that was fixed a few versions ago in core agent (closes any file handles id 3-1024 on start)
The next things we could look at would be strace to see exactly how fork() is called, lsof to see file handles, or digging into child processes (maybe it didn't daemonize right?)