apm-agent-nodejs icon indicating copy to clipboard operation
apm-agent-nodejs copied to clipboard

invalid values for time/duration config vars gives surprising and unwanted behaviour

Open trentm opened this issue 3 years ago • 4 comments

There are a few config vars that support a time/duration as a string of the form <number><unit>, e.g. "10s" to mean 10 seconds. The supported units are "m" for minutes, "ms" for milliseconds, "s" for seconds. If the config value is a number or a string without units, it defaults to seconds.

Currently that parsing/conversion is done by normalizeTime in lib/config.js:

function normalizeTime (opts) {
  for (const key of TIME_OPTS) {
    if (key in opts) opts[key] = toSeconds(String(opts[key]))
  }
}

and the relevant config vars are:

var TIME_OPTS = [
  'abortedErrorThreshold',
  'apiRequestTime',
  'metricsInterval',
  'serverTimeout',
  'spanFramesMinDuration'
]

Currently toSeconds() above will return null if the string value is invalid (e.g. "10ss", "10h", "fuzzywuzzy"). There is no log warning. The results can be surprising: If this null is used in math (often that is the case as the seconds value is converted to milliseconds for usage in setTimeout and the like) or comparisons, JavaScript will treat it as zero, which sometimes is a sentinel value that means something unexpected. Specifically:

  • apiRequestTime: Results in a value of 0 being used to the http-client -> a -1 sentinel value to StreamChopper, which results in time-based closing of requests to APM server not happening which -- for low traffic apps -- results in ERROR (elastic-apm-node): APM Server transport error: APM Server response timeout (30000ms). which results in the stream chopper not
  • metricsInterval: Results in null -> 0 -> turning off metrics collection.
  • serverTimeout: Results in no socket timeout being set on the communication with APM server which could rarely result in hung communication with APM server.
  • spanFramesMinDuration: Results in possibly unexpected additional CPU usage as the agent collections stack traces for all spans.
  • abortedErrorThreshold: Not yet investigated. Comment indicate this is an old undocumented(?) feature that "will be removed".

trentm avatar Jun 21 '21 17:06 trentm

[Sergey Kleyman]

AFAIK most of the agents fallback on default value when the given value is invalid.

trentm avatar Jun 21 '21 17:06 trentm

If that's not in the spec yet, could you add it?

felixbarny avatar Jun 22 '21 15:06 felixbarny

If that's not in the spec yet, could you add it?

Will do, on Colton's PR (https://github.com/elastic/apm/pull/454/files#r656467947) or separately.

trentm avatar Jun 22 '21 18:06 trentm

This thread is a life-saver. Spent good of amount figuring out the weird error. Thanks a ton @trentm

ketansp avatar Jun 11 '22 18:06 ketansp