apm-agent-nodejs
apm-agent-nodejs copied to clipboard
require("elastic-apm-node") influences stacktrace generated with "node --enable-source-maps ..."
Describe the bug
To Reproduce
A. package.json
{
"name": "testbug-stacktrace",
"version": "0.0.0",
"private": false,
"dependencies": {
"elastic-apm-node": "3.36.0"
},
"devDependencies": {
"@types/node": "18.0.6",
"typescript": "4.7.4"
}
}
B. tsconfig.json
{
"include": ["src/**/*.ts"],
"compilerOptions": {
"allowJs": true,
"checkJs": false,
"declaration": true,
"emitDeclarationOnly": false,
"sourceMap": true,
// Generate JS compatible with Node >= 12 - see https://github.com/microsoft/TypeScript/wiki/Node-Target-Mapping
"lib": ["ES2019"],
"module": "commonjs",
"target": "ES2019"
}
}
C. src/main.js
const apm = require('elastic-apm-node');
const mod1 = require("./mod1");
mod1();
const { test } = require("./mod2");
test();
D. src/mod1.ts (CJS module)
module.exports = function test() {
try {
throw Error();
} catch (err) {
console.error(err);
}
}
E. src/mod2.ts (ES6 module)
export async function test() {
try {
throw Error();
} catch (err) {
console.error(err);
}
}
Then using node 16 :
npm install
node_modules\.bin\tsc
node --enable-source-maps src\main.js
Running above script results in :
Error
at test (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod1.js:3:15)
at Object.<anonymous> (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\main.js:4:1)
at Module._compile (node:internal/modules/cjs/loader:1103:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1157:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
at node:internal/main/run_main_module:17:47
Error
at test (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod2.js:6:15)
at Object.<anonymous> (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\main.js:7:1)
at Module._compile (node:internal/modules/cjs/loader:1103:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1157:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
at node:internal/main/run_main_module:17:47
However when in src\main.js the first line (const apm = require('elastic-apm-node')) is put in comment, you will get the right stack trace:
C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod1.ts:3
throw Error();
^
Error
at null.test (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod1.ts:3:11)
at Object.<anonymous> (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\main.js:4:1)
at Module._compile (node:internal/modules/cjs/loader:1103:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1157:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
at node:internal/main/run_main_module:17:47
C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod2.ts:3
throw Error();
^
Error
at null.test (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod2.ts:3:11)
at Object.<anonymous> (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\main.js:7:1)
at Module._compile (node:internal/modules/cjs/loader:1103:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1157:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
at node:internal/main/run_main_module:17:47
Expected behavior
Doing require('elastic-apm-node') should not influence the stacktrace generated by "node --enable-source-maps ...".
Environment (please complete the following information)
- OS: [e.g. Linux] Windows
- Node.js version: 16.14.2
- APM Server version: Not important (don't need server to simulate problem)
- Agent version: 3.36.0
How are you starting the agent? (please tick one of the boxes)
- [ ] Calling
agent.start()
directly (e.g.require('elastic-apm-node').start(...)
) - [ ] Requiring
elastic-apm-node/start
from within the source code - [ ] Starting node with
-r elastic-apm-node/start
Additional context
- Agent config options
Click to expand
replace this line with your agent config options
-
package.json
dependencies:Click to expand
replace this line with your dependencies section from package.json
TIL that node --enable-source-maps ...
is a thing. I had not seen it before.
Doing require('elastic-apm-node') should not influence the stacktrace generated by "node --enable-source-maps ...".
I agree with that statement, in general: that just require'ing the agent should not impact behaviour of the app, at least as much as possible. Starting the agent, ideally, should not impact behaviour of the app either, but I suppose there is more latitude for some impact when the agent is active.
Why require('elastic-apm-node')
has an impact
The reason this is having an impact is because as part of the APM agent's stacktrace collection handling (mainly for serializing a stacktrace when an Error is captured), it imports error-callsites
, which sets a Error.prepareStackTrace
function (per https://v8.dev/docs/stack-trace-api#customizing-stack-traces). However, according to https://nodejs.org/api/all.html#all_cli_--enable-source-maps
Overriding
Error.prepareStackTrace
prevents--enable-source-maps
from modifying the stack trace.
An aside on subtlties in --enable-source-maps
There is also some more subtlety going on here. If one uses node --enable-source-maps bar.js
on a script that has no source map, then the leading context on the printed stacktrace is impacted. This is without the APM agent involved at all:
% cat bar.js
function aFunc() {
throw new Error('boom');
}
aFunc();
% node bar.js
/Users/trentm/tmp/asdf/bar.js:2
throw new Error('boom');
^
Error: boom
at aFunc (/Users/trentm/tmp/asdf/bar.js:2:11)
at Object.<anonymous> (/Users/trentm/tmp/asdf/bar.js:4:1)
at Module._compile (node:internal/modules/cjs/loader:1105:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
...
% node --enable-source-maps bar.js
Error: boom
at aFunc (/Users/trentm/tmp/asdf/bar.js:2:11)
at Object.<anonymous> (/Users/trentm/tmp/asdf/bar.js:4:1)
at Module._compile (node:internal/modules/cjs/loader:1105:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
...
My (limited) understanding of Node's implementation of --enable-source-maps
is that the actual Error.stack
string value is changed to include that leading "/Users/trentm/tmp/asdf/bar.js:2\n throw new Error('boom');\n ^"
context (https://github.com/nodejs/node/issues/43186). And then (I'm guessing) the default uncaughtException
handler in node will no longer add the leading context if in --enable-source-maps
-mode. It looks like this behaviour of adding the leading source context info to Error.stack will change in the next node major version: https://github.com/nodejs/node/pull/43875
Also FWIW, from https://github.com/nodejs/node/issues/41541 it sounds like --enable-source-maps
can be slow with large source maps.
back to the APM agent's impact
I'm not sure exactly what the best things are that the APM agent can do here:
- Perhaps we could delay overriding
Error.prepareStackTrace
until the APM agent is started (i.e. inAgent.start()
). This is also whereError.stackTraceLimit
is set. This would mean this impact would only occur if the APM agent was enabled. That is slightly better.
@dtoch Would that handle your main concern? Or is this also an issue for you when you have the APM agent active?
What to do when the agent is enabled? We want to override Error.prepareStackTrace
to get access to the v8 CallSites data structures. Those are used to report a structured stacktrace to APM server, that you can see in the Error details in the Kibana APM app. Note that the APM agent does support sourcemaps for those reported stacktraces. For example:
% cat foo.ts
interface Foo {
}
function aFunc() {
throw new Error('boom')
}
aFunc()
% ./node_modules/.bin/tsc --sourceMap foo.ts
% cat foo.js
function aFunc() {
throw new Error('boom');
}
aFunc();
//# sourceMappingURL=foo.js.map
% ELASTIC_APM_LOG_UNCAUGHT_EXCEPTIONS=true node --enable-source-maps -r elastic-apm-node/start foo.js
Error: boom
at aFunc (/Users/trentm/tmp/asdf/foo.js:2:11)
at Object.<anonymous> (/Users/trentm/tmp/asdf/foo.js:4:1)
at Module._compile (node:internal/modules/cjs/loader:1105:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
at node:internal/main/run_main_module:17:47
{"log.level":"info","@timestamp":"2022-07-22T22:18:07.856Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"Sending error to Elastic APM: {\"id\":\"e3863ca4f871527aacec1622fe424151\"}"}
and the structured stacktrace sent to APM server looks like this (it has applied the sourcemap):
"exception": {
"message": "boom",
"type": "Error",
"handled": false,
"stacktrace": [
{
"filename": "foo.ts",
"lineno": 5,
"function": "aFunc",
"library_frame": false,
"abs_path": "/Users/trentm/tmp/asdf/foo.ts",
"pre_context": [
"",
"function aFunc() {"
],
"context_line": " throw new Error('boom')",
"post_context": [
"}",
"aFunc()"
]
},
{
"filename": "foo.ts",
"lineno": 7,
"function": "Object.<anonymous>",
"library_frame": false,
"abs_path": "/Users/trentm/tmp/asdf/foo.ts",
"pre_context": [
" throw new Error('boom')",
"}"
],
"context_line": "aFunc()",
"post_context": [
""
]
},
Given that the APM agent is already calculating the source-mapped call sites, it could do the same thing that node's --enable-source-maps
is doing for printing that uncaught exception. Then it could have the printed exception match what you see with node --enable-source-maps foo.js
.
This is only a partial solution, however. It would handle the case of printing an uncaught exception with the source-mapped locations. But it wouldn't handle changing the value of Error.stack
as I believe node --enable-source-maps ...
is doing. Is node doing that synchronously when Error.stack
is first accessed? Probably. If so, there could be a significant performance impact there. Currently the sourcemap handling in the APM agent is only done asynchronously when reporting the Error, so that the event loop is minimally affected.
More investigation would be required to dig into this.
Hi thanks for the quick and detailed reply :-).
-
We only run node with --enable-source-maps when developing locally to be able to more easily pinpoint an error thrown in transpiled code generated from the source code.
-
When running the application in a target production environment, we never enable --enable-source-maps (because as documented it also impacts performance).
However in our source code we have code to start the APM agent and then to interact with the APM Agent API for example to customize the transaction name:
...
const apm = require('elastic-apm-node');
apm.start();
...
apm.setTransactionName(...);
...
When developing locally, the agent is never active because we always set the standard APM agent environment variable ELASTIC_APM_ACTIVE=false but in a target production environment the agent is active.
As a workaround we could wrap above in an if:
...
if (process.env.ELASTIC_APM_ACTIVE == "true") {
const apm = require('elastic-apm-node');
apm.start();
...
apm.setTransactionName(...);
}
...
but ideally we could like to avoid this because if in the rest of the code we also use the APM Agent API, then these should be wrapped as well.
Or could this condition on ELASTIC_APM_ACTIVE be considered within the require('elastic-apm-node') call itself?
Or could this condition on ELASTIC_APM_ACTIVE be considered within the require('elastic-apm-node') call itself?
I think your use case would be handled by doing this:
- Perhaps we could delay overriding Error.prepareStackTrace until the APM agent is started (i.e. in Agent.start()). This is also where Error.stackTraceLimit is set. This would mean this impact would only occur if the APM agent was enabled. That is slightly better.
More specifically we should avoid overriding Error.prepareStackTrace
if the APM agent is disabled via ELASTIC_APM_ACTIVE=true
/ active:true
.
Thanks for clarifying the use case; that helps. I'll see about getting this scheduled for our next planning session.
[email protected] is release with this now
Hi @trentm, sorry to bump such an old issue, but I'm wondering if it still holds true to this day with a node@18 and [email protected] for example.
I've been experimenting around this since we have this problem in production, and with the following minimal example in typescript:
import apm from 'elastic-apm-node';
apm.start();
function foo() {
throw new Error('A');
}
try {
foo();
} catch (e) {
console.info((e as any).stack);
}
Building this file with inlineSourceMap: true
to guarantee the sourcemap to be accessible, I get the following stacktraces.
With apm.start()
:
Error: A
at foo (./file.js:9:11)
at Object.<anonymous> (./file.js:12:5)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
at node:internal/main/run_main_module:23:47
Without apm.start()
:
Error: A
at foo (./file.ts:6:9)
at Object.<anonymous> (./file.ts:10:3)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
at node:internal/main/run_main_module:23:47
Passing --enable-source-maps
doesn't seem to have any impact whatsoever.
Hi @Marsup. I believe what you are seeing is what is currently expected from the above discussion, though obviously it isn't ideal.
The change in [email protected] made it so node's --enable-source-maps
handling was not affected by just importing the APM agent code via require('elastic-apm-node')
. However, starting the APM agent will prevent --enable-source-maps
from doing its thing.
my repro
% npm ls
[email protected] /Users/trentm/tmp/asdf.20231103T165932
├── @types/[email protected]
├── [email protected]
└── [email protected]
foo.ts
import apm from 'elastic-apm-node'; apm
// apm.start()
interface Foo {
}
function aFunc() {
throw new Error('boom')
}
aFunc()
foo-start-apm.ts
import apm from 'elastic-apm-node'; apm
apm.start()
interface Foo {
}
function aFunc() {
throw new Error('boom')
}
aFunc()
Showing with and without --enable-source-maps
:
% ./node_modules/.bin/tsc --sourceMap --esModuleInterop foo.ts
% ./node_modules/.bin/tsc --sourceMap --esModuleInterop foo-start-apm.ts
% node foo.js
/Users/trentm/tmp/asdf.20231103T165932/foo.js:9
throw new Error('boom');
^
Error: boom
at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo.js:9:11)
at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo.js:11:1)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
at node:internal/main/run_main_module:23:47
Node.js v18.18.2
[11:12:25 trentm@pink:~/tmp/asdf (rv:1)]
% node --enable-source-maps foo.js
/Users/trentm/tmp/asdf.20231103T165932/foo.ts:9
throw new Error('boom')
^
Error: boom
at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo.ts:9:11)
at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo.ts:11:1)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
at node:internal/main/run_main_module:23:47
Node.js v18.18.2
This shows that starting the APM agent results in different handling of <Error instance>.stack
and the output to stderr for uncaughtException
events:
% node --enable-source-maps foo-start-apm.js
{"log.level":"info","@timestamp":"2023-11-07T19:12:51.314Z","log.logger":"elastic-apm-node","ecs.version":"8.10.0","agentVersion":"4.1.0","env":{"pid":24860,"proctitle":"node","os":"darwin 22.6.0","arch":"x64","host":"pink.local","timezone":"UTC-0800","runtime":"Node.js v18.18.2"},"config":{"serviceName":{"source":"default","value":"asdf_20231103t165932","commonName":"service_name"},"serviceVersion":{"source":"default","value":"1.0.0","commonName":"service_version"},"serverUrl":{"source":"default","value":"http://127.0.0.1:8200/","commonName":"server_url"},"logLevel":{"source":"default","value":"info","commonName":"log_level"}},"activationMethod":"require","message":"Elastic APM Node.js Agent v4.1.0"}
Error: boom
at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo-start-apm.js:10:11)
at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo-start-apm.js:12:1)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
at node:internal/main/run_main_module:23:47
{"log.level":"info","@timestamp":"2023-11-07T19:12:51.361Z","log.logger":"elastic-apm-node","ecs.version":"8.10.0","message":"Sending error to Elastic APM: {\"id\":\"633a2f196e2837075dded6a48e52e982\"}"}
possible improvements
To improve on this situation will take new work on the APM agent.
- To handle just the case of the output printed to stderr on
uncaughtException
events, the APM agent code could get smarter. It could detect if the--enable-source-maps
option was given (either inprocess.execArgv
or inNODE_OPTIONS
) and then apply sourcemaps itself.
However, this only handles the uncaughtException
case. With node --enable-source-maps ...
the sourcemap translation is done for all <Error instance>.stack
usages. For example:
% cat foo-internal-error.ts
interface Foo {
}
function aFunc() {
throw new Error('boom')
}
try {
aFunc()
} catch (err) {
console.log('caught an err:', err.stack)
}
% ./node_modules/.bin/tsc --sourceMap --esModuleInterop foo-internal-error.ts
% node foo-internal-error.js
caught an err: Error: boom
at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo-internal-error.js:2:11)
at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo-internal-error.js:5:5)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
at node:internal/main/run_main_module:23:47
% node --enable-source-maps foo-internal-error.js
caught an err: Error: boom
at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo-internal-error.ts:4:11)
at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo-internal-error.ts:7:5)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
at node:internal/main/run_main_module:23:47
- The only way for an active elastic-apm-node to stay out of the way of
--enable-source-maps
would be for it to stop usingError.prepareStackTrace
at all. Using this mechanism is the canonical API for getting rich stack trace information: https://v8.dev/docs/stack-trace-api#customizing-stack-traces Perhaps the Node.js--enable-source-maps
handling makes those richerCallSite
object available for other code, like elastic-apm-node, to use or could be made to do so. Or perhapselastic-apm-node
could fallback to doing a best-effort parse of theerr.stack
string.
Would you be willing to open a separate feature request issue for this? It is very helpful to have real user use cases on issues to motivate development.