apm-agent-nodejs APM server response timeout when using await in async function

I have a test.js file with the following code:

const apm = require('elastic-apm-node').start({
            serviceName: 'testWatcher',
            serverUrl: "http://********:8200/",
            serverTimeout: "3s",
            logUncaughtExceptions: true
        });

const sleep = function (sec) {
  return new Promise((resolve) => setTimeout(resolve, sec * 1000));
};

(async () => {
  let currentTransaction = apm.startTransaction("Trans1", "trans");
  currentTransaction.end("success");

  await sleep(10);
})();

When I run it with: node test.js

I always get the following output: {"log.level":"error","@timestamp":"2022-01-17T20:44:28.106Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"APM Server transport error: APM Server response timeout (3000ms)"}

If I remove the line with "await sleep(10)" or make "serverTimeout" option more than 10s (so that it is bigger than the "sleep" argument) then I don't have any errors and the transaction is logged to the server.

The response timeout also happens if I remove the lines:

  // let currentTransaction = apm.startTransaction("Trans1", "trans");
  // currentTransaction.end("success");

I have nodejs v.16.13.1, elastic-apm-node package ver. 3.27.0, running on Windows 10 Enterprise.

What I am doing wrong?

Jan 17 '22 20:01 gribunin

Hi @gribunin. Thanks for the question and very much for the code to reproduce.

There are couple other things going on that are confusing things:

The apiRequestTime config var has a default of 10s and is part of the interaction here. It is a coincidence that this is the same interval as your sleep. I've changed your serverTimeout and sleep values to different values to help illustrate.
If your sleep is less than 10s, then the whole process is finished, and this triggers the APM agent to finish up quickly (before timeouts might be hit). So to avoid this adding to the confusion I've appended this to your test script to keep it alive:

setInterval(function () {
  console.log('still here ...')
}, 3000)

Here is my full script:

const apm = require('./').start({
    serviceName: 'testWatcher',
    // serverUrl: "http://********:8200/",
    serverTimeout: "8s",
    logUncaughtExceptions: true
});

apm.logger.info('start script')

const sleep = function (sec) {
  return new Promise((resolve) => setTimeout(resolve, sec * 1000));
};

(async () => {
  let currentTransaction = apm.startTransaction("Trans1", "trans");
  currentTransaction.end("success");

  apm.logger.info('start sleep')
  await sleep(3);
  apm.logger.info('done sleep')
})();

setInterval(function () {
  console.log('still here ...')
}, 3000)

When I run that I get (piping through the log prettifier ecslog):

% node issue2533.js | ecslog
[2022-01-17T23:31:27.712Z]  INFO (elastic-apm-node): start script
[2022-01-17T23:31:27.715Z]  INFO (elastic-apm-node): start sleep
[2022-01-17T23:31:30.716Z]  INFO (elastic-apm-node): done sleep
still here ...
still here ...
[2022-01-17T23:31:35.729Z] ERROR (elastic-apm-node): APM Server transport error: APM Server response timeout (8000ms)
still here ...
still here ...
^C

What happens here is:

Soon after the script starts, the APM agent starts an "intake API" request to APM server to send the Transaction "Trans1" (and also to send the initial batch of metrics).
This "intake API" request stays open for up to apiRequestTime (10s) to be able to quickly send any additional tracing data (new transactions, spans, errors).
But nothing happens for 8s until the serverTimeout is hit, which means the intake API request is terminated... and hence the "APM Server transport error: ..." error.

A couple notes/questions:

The serverTimeout docs are wrong and misleading. The docs for that same option in the underlying client library are more helpful:
- serverTimeout - HTTP request timeout in milliseconds. If no data is sent or received on the socket for this amount of time, the request will be aborted. It's not recommended to set a serverTimeout lower than the time [this is apiRequestTime] config option. That might result in healthy requests being aborted prematurely.
However, you aren't expected to trace through to see that. I will open a PR to update the agent configuration docs.
The issue in your example script is that serverTimeout is less than apiRequestTime. If there is a lull with no tracing data to send, then it can result in a healthy request to the APM server being terminated. So the question is: what are you wanting to do by setting serverTimeout?

Jan 17 '22 23:01 trentm

if the timeout is 10 seconds, if a request goes more than 10 seconds it won't be capture in the apm ?

Dec 04 '24 12:12 sibelius