dd-trace-php icon indicating copy to clipboard operation
dd-trace-php copied to clipboard

[Bug]: DDTrace 1.10.0 Crashes with PHP 8.4.8 - Maybe still #3197

Open gman-wa opened this issue 6 months ago • 6 comments

Bug report

We have been seeing a lot of these since testing our application with 8.4.8 (coming from 8.3.22)

[Thu Jun 19 19:43:02.816680 2025] [proxy_fcgi:error] [pid 114:tid 215] [client 18.208.8.210:0] AH01067: Failed to read FastCGI header [Thu Jun 19 19:43:02.816701 2025] [proxy_fcgi:error] [pid 114:tid 215] (104)Connection reset by peer: [client 18.208.8.210:0] AH01075: Error dispatching request to [19-Jun-2025 19:43:02] WARNING: [pool www] child 65 exited on signal 11 (SIGSEGV) after 257.745689 seconds from start

Wondering if its related to #3197 ?

~~DDTrace 1.9.0 does not throw errors.~~

Yes, we are using OpCache JIT

PHP version

8.4.8

Tracer or profiler version

1.10.0

Installed extensions

[PHP Modules] Core ctype curl date dom fileinfo filter gd hash iconv igbinary intl json libxml mbstring memcached msgpack mysqli mysqlnd openssl pcre PDO pdo_mysql pdo_sqlite Phar posix random readline Reflection session SimpleXML sodium SPL sqlite3 standard tokenizer xml xmlreader xmlwriter Zend OPcache zlib

[Zend Modules] Zend OPcache

Output of phpinfo()

{ "date": "2025-06-19T20:10:13Z", "os_name": "Linux ip-XXXXXX.ec2.internal 5.10.235-227.919.amzn2.aarch64 #1 SMP Sat Apr 5 16:59:44 UTC 2025 aarch64", "os_version": "5.10.235-227.919.amzn2.aarch64", "version": "1.10.0", "lang": "php", "lang_version": "8.4.1", "env": "XXXXXX", "enabled": true, "service": "XXXXXX", "enabled_cli": true, "agent_url": "http://localhost:8126", "debug": false, "analytics_enabled": false, "sample_rate": 1, "sampling_rules": [], "tags": { }, "service_mapping": [], "distributed_tracing_enabled": true, "dd_version": "XXXXXX", "architecture": "aarch64", "instrumentation_telemetry_enabled": true, "sapi": "fpm-fcgi", "datadog.trace.sources_path": "/opt/datadog/dd-library/1.10.0/dd-trace-sources/src", "open_basedir_configured": false, "uri_fragment_regex": null, "ori_mapping_incoming": null, "uri_mapping_outgoing": null, "auto_flush_enabled": false, "generate_root_span": true, "http_client_split_by_domain": true, "measure_compile_time": true, "report_hostname_on_root_span": false, "traced_internal_functions": null, "enabled_from_env": true, "opcache.file_cache": null, "sidecar_trace_sender": true }

Diagnostic checks passed

Upgrading from

8.3.22

gman-wa avatar Jun 19 '25 22:06 gman-wa

Update - we are getting seg faults in 1.9.0 too - trying 1.8.3 now.

Update - still happening with 1.8.3.

Update - still happening with 1.7.3.

Update - still happening with 1.6.3.

Open to ideas at this point

gman-wa avatar Jun 19 '25 23:06 gman-wa

I have the exact same issue (php 8.4.8 / ddtrace 1.10.0), it returns a SEGFAULT.

Core was generated by `php-fpm:'.
Program terminated with signal SIGSEGV, Segmentation fault.

I was trying to get a core dump with php debug symbol but got the following error:

17.35 ERROR: debug builds of PHP 8.4.8 are currently not supported
------
Dockerfile:95
--------------------
  94 |     
  95 | >>> RUN curl -LO https://github.com/DataDog/dd-trace-php/releases/download/$%7BDATADOG_VERSION%7D/datadog-setup.php \
  96 | >>>   && php datadog-setup.php --php-bin=all
  97 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c curl -LO https://github.com/DataDog/dd-trace-php/releases/download/$%7BDATADOG_VERSION%7D/datadog-setup.php && php datadog-setup.php --php-bin=all" did not complete successfully: exit code: 1

I can get around this by disabling the profiler, removing the --enable-profiling in the installation options. This seems to resolve the SEGFAULT but I'm not 100% sure it fixes all the problems.

Also I can see that if you don't enable appsec option (eg. remove --enable-appsec from the installation script), the extension is still installed and loaded when you run php -v

PHP 8.4.8 (cli) (built: Jun  6 2025 17:38:35) (NTS)
Copyright (c) The PHP Group
Built by https://github.com/docker-library/php
Zend Engine v4.4.8, Copyright (c) Zend Technologies
    with Zend OPcache v8.4.8, Copyright (c), by Zend Technologies
    with ddtrace v1.10.0, Copyright Datadog, by Datadog
    with datadog-profiling v1.10.0, Copyright Datadog, by Datadog
    with ddappsec v1.10.0, Copyright Datadog, by Datadog

Maxwell2022 avatar Jun 23 '25 04:06 Maxwell2022

Hey @gman-wa 👋 thanks for opening this report! I read that you had no crashes with PHP 8.3, after upgrading to 8.4 you are experiencing segfaults. Are you using the profiler? And in in either case, can you try disabling JIT and (in case you are using the profiler) also try disabling allocation profiling by changing the INI setting datadog.profiling.allocation_enabled or the environment variable DD_PROFILING_ALLOCATION_ENABLED to 0 and let me know if that fixes your issue? Additionally we'd need a stack trace to get closer to the root cause of this, this guide can help you obtain one for us. Please also feel free to sign up on https://chat.datadoghq.com and send me your ORG-ID via Slack.

realFlowControl avatar Jun 23 '25 06:06 realFlowControl

Hey @Maxwell2022 👋 the same is true for you, in regards to the crash you are observing. Regarding the debug build of PHP: to obtain a stack trace, you do not need a debug build of PHP just an unstripped binary or the symbols installed, the troubleshooting an application crash guide should be helpful to you as well in helping you obtain a stack trace for us.

In regards to ddappsec being around even when removing the --enable-appsec argument when installing: the AppSec extension will be installed either way but disabled, so it is loaded, but not doing anything. The main reason for this is that this allows you to enable AppSec via "Remote Configuration".

realFlowControl avatar Jun 23 '25 06:06 realFlowControl

Thanks @realFlowControl. We are running our application in the php8.4-fpm-alpine3.21 image so I've installed php84-dev package in the container and copy the dump from the host to debug in the container.

This is what I ended up with:

(gdb) bt full
#0  __restore_sigs (set=set@entry=0x7fff380dcae0) at ./arch/x86_64/syscall_arch.h:40
No locals.
#1  0x00007d62b1ca6e1b in raise (sig=<optimized out>) at src/signal/raise.c:11
        set = {__bits = {0, 140734133816432, 16, 137862812331363, 140734133816760, 140734133816160, 140734133816416, 137862843101090, 0, 67108864, 137862843102628, 0,
            137862812527072, 1140850692, 137862843102628, 0}}
        ret = 0
#2  0x00007d62aff8268a in datadog_crashtracker::collector::signal_handler_manager::chain_signal_handler ()
    at libdatadog/datadog-crashtracker/src/collector/signal_handler_manager.rs:125
No locals.
#3  datadog_crashtracker::collector::crash_handler::handle_posix_sigaction () at libdatadog/datadog-crashtracker/src/collector/crash_handler.rs:100
No locals.
#4  <signal handler called>
No locals.
#5  0x0000648a5604d815 in ?? ()
No symbol table info available.
#6  0x0000648a5604eaf6 in ?? ()
No symbol table info available.
#7  0x0000648a564368ad in zend_call_function ()
No symbol table info available.
#8  0x0000648a562d035b in ?? ()
No symbol table info available.
#9  0x0000648a562d7139 in ?? ()
No symbol table info available.
#10 0x0000648a5604d122 in ?? ()
No symbol table info available.
#11 0x0000648a5604eb1d in ?? ()
No symbol table info available.
#12 0x0000648a564860d3 in zend_execute ()
No symbol table info available.
#13 0x0000648a564ef3e0 in zend_execute_script ()
No symbol table info available.
#14 0x0000648a56388d03 in php_execute_script_ex ()
No symbol table info available.
#15 0x0000648a56055686 in ?? ()
No symbol table info available.
#16 0x00007d62b1c8a496 in libc_start_main_stage2 (main=0x648a56054920, argc=1, argv=0x7fff380e0928) at src/env/__libc_start_main.c:95
        envp = 0x7fff380e0938
#17 0x0000648a5605648a in _start ()
No symbol table info available.
(gdb)

Maxwell2022 avatar Jun 23 '25 07:06 Maxwell2022

The php84-dev package from Alpine does not match the PHP binary that gets build for that Docker container from DockerHub. You should be able though to build the container (make sure to build on the same architecture and versions), remove the symbol stripping and analyze the core file using gdb in that new container. The lines you'd have to remove from the Dockerfile are these

realFlowControl avatar Jun 23 '25 08:06 realFlowControl

Thanks @realFlowControl

I've rebuild the base fpm image without these lines and then rebuild our application using this new image. I have a very similar output tho:

(gdb) bt full
#0  __restore_sigs (set=set@entry=0x7ffc92e622a0) at ./arch/x86_64/syscall_arch.h:40
No locals.
#1  0x000072e3d1b40e1b in raise (sig=<optimized out>) at src/signal/raise.c:11
        set = {__bits = {0, 140722773042224, 16, 126322770472291, 140722773042552, 140722773041952, 140722773042208, 126322801381282, 0, 67108864, 126322801382820, 0,
            126322770668000, 1140850692, 126322801382820, 0}}
        ret = 0
#2  0x000072e3cfdfa68a in datadog_crashtracker::collector::signal_handler_manager::chain_signal_handler ()
    at libdatadog/datadog-crashtracker/src/collector/signal_handler_manager.rs:125
No locals.
#3  datadog_crashtracker::collector::crash_handler::handle_posix_sigaction () at libdatadog/datadog-crashtracker/src/collector/crash_handler.rs:100
No locals.
#4  <signal handler called>
No locals.
#5  0x00005c043464d815 in ?? ()
No symbol table info available.
#6  0x00005c043464eaf6 in ?? ()
No symbol table info available.
#7  0x00005c0434a368ad in zend_call_function ()
No symbol table info available.
#8  0x00005c0434a36cad in zend_call_known_function ()
No symbol table info available.
#9  0x00005c04348adad0 in ?? ()
No symbol table info available.
#10 0x00005c0434a35aee in zend_lookup_class_ex ()
No symbol table info available.
#11 0x00005c0434a0c5f8 in ?? ()
No symbol table info available.
#12 0x00005c0434a0c7e8 in ?? ()
No symbol table info available.
#13 0x00005c042e050851 in ?? ()
No symbol table info available.
#14 0x000072e3d0b69c40 in ?? ()
No symbol table info available.
#15 0x000072e300000007 in ?? ()
No symbol table info available.
#16 0x00005c042887c140 in ?? ()
No symbol table info available.
#17 0x0000000234a87860 in ?? ()
No symbol table info available.
#18 0x0000000000000000 in ?? ()
No symbol table info available.

Maxwell2022 avatar Jun 24 '25 00:06 Maxwell2022

Also disabling opcache.jit is fixing the issue

-opcache.jit = tracing
+opcache.jit = disable

However I'm not sure of the performance impact this would have in our application

Maxwell2022 avatar Jun 24 '25 00:06 Maxwell2022

Hmm, we might need a -g in the PHP_CFLAGS in the the Dockerfile. Can you also tell me if setting the environment variable DD_PROFILING_ALLOCATION_ENABLED or the INI setting datadog.profiling.allocation_enabled to 0 fixes your issue with enabled JIT?

realFlowControl avatar Jun 24 '25 05:06 realFlowControl

@Maxwell2022 the stacktrace you are seeing (even though not complete) looks different from #3197, I assume this is something unrelated, I am still trying to figure out whats going on! The stack frame with datadog_crashtracker::collector::crash_handler::handle_posix_sigaction just shows that the crashtracker has picked up the crash and send us a report. Can you give me your org-id (either through support or https://chat.datadoghq.com)

realFlowControl avatar Jul 02 '25 05:07 realFlowControl

@gman-wa the list of installed extensions does not contain anything Datadog related, so most likely dd-trace-php is installed in another SAPI than the list was generated from. Can you confirm that dd-trace-php is installed and which extensions?

$ php -v
PHP 8.4.6 (cli) (built: Apr 28 2025 23:07:51) (NTS)
Copyright (c) The PHP Group
Built by https://github.com/docker-library/php
Zend Engine v4.4.6, Copyright (c) Zend Technologies
    with ddtrace v1.10.0, Copyright Datadog, by Datadog
    with datadog-profiling v1.10.0, Copyright Datadog, by Datadog
    with ddappsec v1.10.0, Copyright Datadog, by Datadog

In case you see a stack trace similar to #3197, I've opened PR #3319 which should be a more thorough fix to the root issue, if you want give it a try, you can install the version from that PR using:

$ curl -LO https://s3.us-east-1.amazonaws.com/dd-trace-php-builds/1.11.0%2B331167f571e8f6070c6a01bb2f9c082d2b4062b5/datadog-setup.php
$ php -n datadog-setup.php --enable-profiling

realFlowControl avatar Jul 02 '25 05:07 realFlowControl

@gman-wa the list of installed extensions does not contain anything Datadog related, so most likely dd-trace-php is installed in another SAPI than the list was generated from. Can you confirm that dd-trace-php is installed and which extensions?

$ php -v PHP 8.4.6 (cli) (built: Apr 28 2025 23:07:51) (NTS) Copyright (c) The PHP Group Built by https://github.com/docker-library/php Zend Engine v4.4.6, Copyright (c) Zend Technologies with ddtrace v1.10.0, Copyright Datadog, by Datadog with datadog-profiling v1.10.0, Copyright Datadog, by Datadog with ddappsec v1.10.0, Copyright Datadog, by Datadog In case you see a stack trace similar to #3197, I've opened PR #3319 which should be a more thorough fix to the root issue, if you want give it a try, you can install the version from that PR using:

$ curl -LO https://s3.us-east-1.amazonaws.com/dd-trace-php-builds/1.11.0%2B331167f571e8f6070c6a01bb2f9c082d2b4062b5/datadog-setup.php $ php -n datadog-setup.php --enable-profiling

I'll give the PR build a try this week. We're running in Fargate so it's difficult to ssh in. I will also see if I can replicate the behavior locally. We normally don't run ddtrace in our local docker environments.

gman-wa avatar Jul 06 '25 17:07 gman-wa

The PR build is alot better - only 3 seg faults in last couple hours. So not entirely better, but better. Again, this is Fargate so not able to do any debug operations. I will see if I can replicate locally.

Here is the other php info

PHP 8.4.10

This program makes use of the Zend Scripting Language Engine: Zend Engine v4.4.10, Copyright (c) Zend Technologies with Zend OPcache v8.4.10, Copyright (c), by Zend Technologies with ddtrace v1.11.0+331167f571e8f6070c6a01bb2f9c082d2b4062b5, Copyright Datadog, by Datadog with datadog-profiling v1.11.0+331167f571e8f6070c6a01bb2f9c082d2b4062b5, Copyright Datadog, by Datadog

gman-wa avatar Jul 08 '25 21:07 gman-wa

Hey @gman-wa, can you let me know your ORG-ID? Either open a support-case or signup to our public slack

realFlowControl avatar Jul 15 '25 11:07 realFlowControl

@realFlowControl - support case created - https://help.datadoghq.com/hc/requests/2190901

gman-wa avatar Jul 17 '25 16:07 gman-wa

Hey @gman-wa, from what I see there should not be any recent crashes anymore, can you confirm this? If not, please re-open or create a new support ticket. From what I see on our end, there should not be any crashes anymore so I am closing this

realFlowControl avatar Nov 18 '25 16:11 realFlowControl

Thanks @realFlowControl - just tried 1.14.0 w/ 8.4.15 and still seeing the crashes.

gman-wa avatar Nov 22 '25 17:11 gman-wa