[BUG]: Version 5.68.0 performance degradation
Tracer Version(s)
5.68.0
Node.js Version(s)
23.8.0
Bug Report
The performance issues were noticed during nightly stress tests ran on an API w/ k6. Reverting to 5.67.0 fixes the issues.
stress_test: {
executor: 'ramping-vus',
stages: [
{duration: '10m', target: 100},
{duration: '30m', target: 500}, // 500 rps
{duration: '5m', target: 0},
],
tags: {test_type: 'stress'},
exec: 'stressTest',
}
All of the images above have 3 blips you can see:
- 1st: v5.69.0 (bad)
- 2nd: v5.67.0 (good)
- 3rd: v5.68.0 (bad)
You can also see over a week period, and the increase correlates exactly with our dd-trace version bump:
Under normal load in production, the change can be noticed as well:
Reproduction Code
No response
Error Logs
No response
Tracer Config
import tracer from 'dd-trace';
const ddTrace = tracer.init();
ddTrace.use('graphql');
ddTrace.use('pg');
ddTrace.use('redis');
ddTrace.use('hono');
ddTrace.use('http', {
blocklist: [/^\/health(.*)/i],
});
"environment": [
{
"name": "ENV",
"value": "${environment_name}"
},
{
"name": "PORT",
"value": "${container_port}"
},
{
"name": "NODE_ENV",
"value": "${environment_name}"
},
{
"name": "REDIS_URL",
"value": "rediss://${redis_url}:6379"
},
{
"name": "DD_ENV",
"value": "${environment_name}"
},
{
"name": "DD_SERVICE",
"value": "${project_name}"
},
{
"name": "DD_VERSION",
"value": "git:${git_commit}"
},
{
"name": "DD_RUNTIME_METRICS_ENABLED",
"value": "true"
},
{
"name": "DD_IAST_ENABLED",
"value": "true"
},
{
"name": "DD_APPSEC_ENABLED",
"value": "true"
},
{
"name": "DD_DATA_STREAMS_ENABLED",
"value": "true"
},
{
"name": "DD_APM_ENABLED",
"value": "true"
},
{
"name": "DD_TRACE_ENABLED",
"value": "true"
},
{
"name": "DD_RUNTIME_METRICS_ENABLED",
"value": "true"
},
{
"name": "DD_GIT_REPOSITORY_URL",
"value": "${github_repo_url}"
},
{
"name": "DD_PROFILING_ENABLED",
"value": "true"
},
{
"name": "DD_DBM_PROPAGATION_MODE",
"value": "full"
},
{
"name": "DD_CRASHTRACKING_ENABLED",
"value": "false"
},
{
"name": "DD_LOGS_INJECTION",
"value": "true"
},
{
"name": "DD_TRACE_DEBUG",
"value": "false"
}
]
Operating System
runtime_arch:arm64 -- Running as sidecar with our ECS fargate tasks
EDIT:
Seems like other people are mentioning it's only graphql related, which lines up with our other non-graphql applications not experiencing this issue.. will list below related library versions
"@escape.tech/graphql-armor": "3.1.7",
"@graphql-yoga/plugin-csrf-prevention": "3.16.0",
"@pothos/core": "4.7.2",
"@pothos/plugin-drizzle": "0.11.0",
"@pothos/plugin-relay": "4.6.1",
"@pothos/plugin-scope-auth": "4.1.5",
"@pothos/plugin-with-input": "4.1.2",
"graphql": "16.11.0",
"graphql-scalars": "1.25.0",
"graphql-yoga": "5.16.0",
We’re seeing the same / similar behavior.
Tracer version(s): 5.72.0 Node.js: 24.10.0 Affected service: our GraphQL API (Apollo Server). Other services are not noticeably impacted.
After upgrading from 5.64.0 to 5.65.0+, memory grow steadily under steady traffic and do not recover––rolling back to 5.64.0 immediately stabilizes memory. No other code/config changes between these bumps.
Same experience as @itsjgf
Tracer version(s): 5.72.0 Node.js: 22.20 Affected service: our GraphQL API (Apollo Server). Other services are not noticeably impacted. We rolled back only dd-trace to version 5.67 after noticing the issue through a hotfix. Looking at other reporters' experience in this issue, it seems likely that something changed in 5.68 affecting graphql tracing.
@Cellule can you share which Apollo Version do yo use?
We are using @apollo/server 4.12.2