dd-trace-php icon indicating copy to clipboard operation
dd-trace-php copied to clipboard

[BUG] DDTrace\Tracer->reset() re-enables ddtrace extension which seems to reset the random seed, providing more surface for collisions

Open olsavmic opened this issue 2 years ago • 2 comments

Bug description

Hi, We're seeing spans with duplicate IDs in Datadog.

We already reached out to the official support (ticket #846633) and the issue was confirmed.

We suspect that the main issue is the introduction of extension disable & reenable in \DDTrace\Tracer->reset() which causes the ddtrace_seed_prng to be called, resetting the random seed.

        if (_ddtrace_config_bool(ini_get("datadog.trace.enabled"), false)) {
            // Do a full shutdown and re-startup of the tracer, which implies clearing the internal span stack
            ini_set("datadog.trace.enabled", 0);
            ini_set("datadog.trace.enabled", 1);
        }

Since we're calling the Tracer->reset() after every processed consumer message and we have over 100 consumers running in parallel on different hosts, the probability of potential collision raises significantly since the seed generation seems to rely only on microsecond time and pid (which is always 1 in our case for PHP process since we're running containers).

Additional context:

  • We're not setting the span_id or trace_id manually anywhere.
  • The distributed tracing is disabled for now using the ENV variable DD_DISTRIBUTED_TRACING = 0
  • We're using DDTrace\Tracer for the creation of userland spans (\DDTrace\SpanContext::createAsRoot sets the trace_id == span_id)
  • We're resetting the tracer for long running commands by calling DDTrace\Tracer->reset() which disables and reenables the dd_trace extension

If our assumptions are correct, this issue could be mitigated by seeding the generator only once during the module load instead of every time the extension is enabled or, ideally, introducing more entropy to the seed generation.

Example of duplicated span ids (environment, service, and the host do not match, the span_id is duplicated though).

{
    "discovery_timestamp": 1659607255232,
    "source_fragment_id": "018268b8-0003-43f1-9334-228e35944f1d",
    "tag": {
        "source": "apm"
    },
    "host": "XXX-85f794bf7-sfwxn",
    "id": "AYJoTQjAAAAppAJUhWEtBio0",
    "resource_hash": "bc535f5b36f4da79",
    "complete_trace": true,
    "service": "XXX",
    "analytics_enabled": false,
    "custom": {
        "env": "prod",
        "language": "php",
        "_top_level": 1,
        "app": "XXX",
        "version": "prod-2022-08-04-09-39-50-319525dd35",
        "release": "prod-2022-08-04-09-39-50-319525dd35",
        "process_id": "1",
        "_trace_root": 1,
        "duration": 239596564
    },
    "source": "apm",
    "env": "prod",
    "type": "queue",
    "status": "ok",
    "resource_name": "XXX",
    "apm_source": "trace-search",
    "tags": [
        "source:apm"
    ],
    "timestamp": "2022-08-04T10:00:54.468Z",
    "ingest_size_in_bytes": 760,
    "custom_ingest_timestamp": 1659607255232,
    "random_draw": 0.6809393202944073,
    "tiebreaker": -44995267,
    "span_id": "3758971176680327077",
    "trace_id": "3758971176680327077",
    "operation_name": "consume"
}
 
And here is the span for the [trace associated](https://app.datadoghq.com/apm/trace/3758971176680327077?colorBy=service&env=prod&graphType=flamegraph&shouldShowLegend=true&sort=time&spanID=3758971176680327077&spanViewType=metadata&timeHint=1659626313179.395):
{
    "metrics": {
        "_sampling_priority_v1": 1,
        "_top_level": 1,
        "_dd1.sr.esusr": 0.1,
        "_trace_root": 1,
        "_dd.rule_psr": 1
    },
    "resource_hash": "5f3afbe430eceed9",
    "end": 1659626313.189985,
    "name": "consume",
    "service": "XXX-be",
    "start": 1659626313.179395,
    "hostname": "i-073b6a12d71e965a5",
    "org_id": 375251,
    "trace_id": "3758971176680327077",
    "host_groups": [
        "env:prod"
    ],
    "parent_id": "0",
    "meta": {
        "_dd.hostname": "XXX-d656c5db5-tsbmw",
        "version": "prod-2022-08-04-14-16-48-45787ef4a9",
        "language": "php",
        "app": "backend",
        "_dd.filter.type": "spans-sampling-processor",
        "_dd.agent_version": "7.34.0",
        "process_id": "1",
        "env": "prod",
        "release": "prod-2022-08-04-14-16-48-45787ef4a9",
        "_dd.filter.id": "wAXsUESlQy2kUrKZ2WKP_Q",
        "_dd.p.upstream_services": "YmFja2VuZC1jb25zdW1lcnMtQkU|1|1|1.000",
        "_dd.tracer_version": "0.74.0"
    },
    "env": "prod",
    "host_id": 81866138,
    "duration": 0.010589301,
    "resource": "XXX",
    "children_ids": [
        "15130145109330029809",
        "17840514285854287844"
    ],
    "type": "queue",
    "span_id": "3758971176680327077"
}

PHP version

PHP 8.1.7

Tracer version

0.74.0

Installed extensions

[PHP Modules]
bcmath
blackfire
calendar
Core
ctype
curl
date
ddappsec
ddtrace
dom
exif
FFI
fileinfo
filter
ftp
gd
gettext
hash
iconv
igbinary
imagick
imap
intl
json
libxml
mbstring
mysqli
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
Phar
posix
readline
Reflection
session
shmop
SimpleXML
soap
sockets
sodium
SPL
standard
sysvmsg
sysvsem
sysvshm
tokenizer
xml
xmlreader
xmlwriter
xsl
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache
ddappsec
ddtrace

OS info

PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster

olsavmic avatar Aug 10 '22 08:08 olsavmic

Perhaps in our case, we can get rid of the Tracer->reset() call. Anyway, I thought it's worth considering and at least documenting for others :)

olsavmic avatar Aug 10 '22 09:08 olsavmic

Hey @olsavmic,

Hm. GENERATE_SEED is defined as (((zend_long) (time(0) * getpid())) ^ ((zend_long) (1000000.0 * php_combined_lcg()))). I.e. seconds * pid, then xor by the lcg multiplied by a million. That latter operation seems to be quite weak.

I can confirm about 0.5% duplicate ids within a batch of 10000. This is too much and it also indicates, that, the probability of duplicates is in general too high, if you have multiple machines.

bwoebi avatar Aug 10 '22 09:08 bwoebi