apm-agent-nodejs
apm-agent-nodejs copied to clipboard
Transactions and Spans are getting attached to wrong parent transaction
Describe the bug We have integrated the APM in our microservice architecture making use of distributed tracing feature. But we are running into a issue that often child transactions and spans end up on the wrong parent transaction. This happens mostly when requests come simultaneously(concurrently).
Example trace having the above mentioned issue:

In the above trace the /getIntersectingHubs API call shouldn't be there as in api/alerting/getAlerts we never call this API.
Expected behavior There shouldn't be any extra API or database call in the distributed trace.
Given below is the correct distributed trace for api/alerting/getAlerts API.

Environment (please complete the following information)
- OS: ubuntu 18.04
- Node.js version: 8.17.0
- APM Server version: 7.4.2
- Agent version: 3.5.0
How are you starting the agent? (please tick one of the boxes)
- [X] Calling
agent.start()directly (e.g.require('elastic-apm-node').start(...)) - [ ] Requiring
elastic-apm-node/startfrom within the source code - [ ] Starting node with
-r elastic-apm-node/start
Additional context
- Agent config options
ELASTIC_APM_ENABLED=true ELASTIC_APM_SERVER_URL="http://<host_URL>:8200" ELASTIC_APM_SERVICE_NAME="$SERVICE_ID-$ENV" ELASTIC_APM_ASYNC_HOOKS=false ELASTIC_APM_CAPTURE_BODY="off" ELASTIC_APM_TRANSACTION_SAMPLE_RATE=0.1 ELASTIC_APM_CAPTURE_EXCEPTIONS=true ELASTIC_APM_CAPTURE_ERROR_LOG_STACK_TRACES="messages" ELASTIC_APM_CAPTURE_SPAN_STACK_TRACES=false ELASTIC_APM_SOURCE_LINES_SPAN_APP_FRAMES=0 ELASTIC_APM_SOURCE_LINES_SPAN_LIBRARY_FRAMES=0 ELASTIC_APM_SOURCE_LINES_ERROR_APP_FRAMES=0 ELASTIC_APM_SOURCE_LINES_ERROR_LIBRARY_FRAMES=0 ELASTIC_APM_ERROR_MESSAGE_MAX_LENGTH="1kb" ELASTIC_APM_STACK_TRACE_LIMIT=0 ELASTIC_APM_TRANSACTION_MAX_SPANS=100 ELASTIC_APM_API_REQUEST_TIME="10s" ELASTIC_APM_API_REQUEST_SIZE="768kb" ELASTIC_APM_METRICS_INTERVAL="30s"
@sunishdahiya Nice to meet you, and thank you for reporting this bug.
Based on what I see so far and your description, this looks like a classic example of asynchronous transaction state conflation. Because of Node's asynchronous execution model the Agent can sometimes loose track of what the current transaction is. This can be especially true if you have
ELASTIC_APM_ASYNC_HOOKS=false
set to false and you're using a DB package or HTTP fetching package that we don't explicitly instrument.
We have a few questions that will better help us understand this.
- Do you know why you're using
ELASTIC_APM_ASYNC_HOOKS=false? If you set this totruedoes the problem remain? - What DB library (or libraries) are you using to make your DB calls?
- What HTTP library (if any) are you using to make your service requests?
- Are you doing any custom instrumentation with the Agent API?
With that information in hand we should be able to better diagnose this issue and advise.
Hi @astorm , Thanks for the quick response
We have set ELASTIC_APM_ASYNC_HOOKS to false because in the documentation it is mentioned that ASYNC_HOOKS will have no effect if the node version is older than 8.2.0 and all our services use node version 8.1.0.
- We are using
"sequelize": "^3.19.3" and "mongoose": "~4.5.9". "request-promise": "^4.2.4"- We are using
startTransactionAPI only to capture kafka-events. Code for that is given below
const recordTransaction = async (transactionType, transactionName, transactionHandler, ...transactionHandlerArgs) => {
const transaction = apm.startTransaction(transactionName, transactionType)
try{
await transactionHandler(...transactionHandlerArgs);
transaction.end()
} catch(error) {
await apm.captureError(error)
transaction.end()
throw error
}
}
@astorm Any update on this?
I am using default value for asyncHooks, which documentations says its true and I am seeing wrong transaction traces
With version 3.24.0 of the agent there was a significant re-work of the async run context handling that is largely responsible for these kinds of issues.
However, often it really comes down to specifics. @williamoliveira I would encourage you to open a separate issue with as much detail as you can provide. The ideal is a repro case showing the issue.
@sunishdahiya Sorry for the delay. With the above information I'm not able to confirm whether the changes in v3.24.0 of the agent fix the issue for you. Are you able to retry with the latest agent version? Then we could go from there?
await Promise.resolve(); helps to workaround this issue. Just place it before the manual transaction/span creation.