Proposal: stack frame compression
Is your feature request related to a problem? Please describe.
Agents capture stack frames for spans and errors. This can be quite expensive, both capturing the stack trace, and rendering it into JSON.
Describe the solution you'd like
By applying static code analysis, we can identify and compress static call chains. That is, if we know that function B is always called by function A, then we can report only B, and have the server fill in the blanks.
As an example, take the following real stack trace from a simple Go application:
"stacktrace": []interface {}{
map[string]interface {}{
"filename": "span.go",
"lineno": float64(275),
"abs_path": "/home/andrew/go/src/go.elastic.co/apm/span.go",
"function": "(*Span).End",
"library_frame": bool(true),
"module": "go.elastic.co/apm",
},
map[string]interface {}{
"filename": "srv.go",
"lineno": float64(16),
"abs_path": "/tmp/srv.go",
"function": "main.func1",
"module": "main",
},
map[string]interface {}{
"library_frame": bool(true),
"module": "net/http",
"filename": "server.go",
"lineno": float64(1964),
"abs_path": "/home/andrew/tools/go/1.11.5/src/net/http/server.go",
"function": "HandlerFunc.ServeHTTP",
},
map[string]interface {}{
"abs_path": "/home/andrew/tools/go/1.11.5/src/net/http/server.go",
"function": "(*ServeMux).ServeHTTP",
"library_frame": bool(true),
"module": "net/http",
"filename": "server.go",
"lineno": float64(2361),
},
map[string]interface {}{
"filename": "handler.go",
"lineno": float64(87),
"abs_path": "/home/andrew/go/src/go.elastic.co/apm/module/apmhttp/handler.go",
"function": "(*handler).ServeHTTP",
"library_frame": bool(true),
"module": "go.elastic.co/apm/module/apmhttp",
},
map[string]interface {}{
"function": "serverHandler.ServeHTTP",
"library_frame": bool(true),
"module": "net/http",
"filename": "server.go",
"lineno": float64(2741),
"abs_path": "/home/andrew/tools/go/1.11.5/src/net/http/server.go",
},
map[string]interface {}{
"abs_path": "/home/andrew/tools/go/1.11.5/src/net/http/server.go",
"function": "(*conn).serve",
"library_frame": bool(true),
"module": "net/http",
"filename": "server.go",
"lineno": float64(1847),
},
map[string]interface {}{
"filename": "asm_amd64.s",
"lineno": float64(1333),
"abs_path": "/home/andrew/tools/go/1.11.5/src/runtime/asm_amd64.s",
"function": "goexit",
"library_frame": bool(true),
"module": "runtime",
},
},
Hypothetically, if
- the server had access to the source, and were able to classify the frames as being library frames or not, and
- the first "Span.End" frame were elided
- we don't care about the absolute path
then, because the line "srv.go:16" can provably have only one call graph, then we could reduce this to something like:
map[string]interface {}{
"filename": "srv.go",
"lineno": float64(16),
"function": "main.func1",
"module": "main",
},
(We could potentially also drop "function", if the server were capable of analysing or querying some other service about source code.)
Moreover, using source-to-source transformation, the source location information could be encoded into the span details. This would avoid an expensive call to acquire the stack trace, and would enable us to report the stack trace for all spans and eliminate the min-duration configuration.
Describe alternatives you've considered
Status quo.
This could also improve CPU usage on the server side, as former investigations indicated that parsing and processing a large number of stacktrace frames is very CPU heavy.
Is that because of the schema validation or JSON deserialization/serialization?
I don't have an answer on that yet, as I saw it on stress tests and not on benchmark tests listing individual code parts. But I also don't think that is the main question for this Issue, just wanted to note, that this could also be a win for the server.
I can't see this happening in the foreseeable future.