apm icon indicating copy to clipboard operation
apm copied to clipboard

Proposal: stack frame compression

Open axw opened this issue 6 years ago • 4 comments

Is your feature request related to a problem? Please describe.

Agents capture stack frames for spans and errors. This can be quite expensive, both capturing the stack trace, and rendering it into JSON.

Describe the solution you'd like

By applying static code analysis, we can identify and compress static call chains. That is, if we know that function B is always called by function A, then we can report only B, and have the server fill in the blanks.

As an example, take the following real stack trace from a simple Go application:

"stacktrace":     []interface {}{
    map[string]interface {}{
        "filename":      "span.go",
        "lineno":        float64(275),
        "abs_path":      "/home/andrew/go/src/go.elastic.co/apm/span.go",
        "function":      "(*Span).End",
        "library_frame": bool(true),
        "module":        "go.elastic.co/apm",
    },
    map[string]interface {}{
        "filename": "srv.go",
        "lineno":   float64(16),
        "abs_path": "/tmp/srv.go",
        "function": "main.func1",
        "module":   "main",
    },
    map[string]interface {}{
        "library_frame": bool(true),
        "module":        "net/http",
        "filename":      "server.go",
        "lineno":        float64(1964),
        "abs_path":      "/home/andrew/tools/go/1.11.5/src/net/http/server.go",
        "function":      "HandlerFunc.ServeHTTP",
    },
    map[string]interface {}{
        "abs_path":      "/home/andrew/tools/go/1.11.5/src/net/http/server.go",
        "function":      "(*ServeMux).ServeHTTP",
        "library_frame": bool(true),
        "module":        "net/http",
        "filename":      "server.go",
        "lineno":        float64(2361),
    },
    map[string]interface {}{
        "filename":      "handler.go",
        "lineno":        float64(87),
        "abs_path":      "/home/andrew/go/src/go.elastic.co/apm/module/apmhttp/handler.go",
        "function":      "(*handler).ServeHTTP",
        "library_frame": bool(true),
        "module":        "go.elastic.co/apm/module/apmhttp",
    },
    map[string]interface {}{
        "function":      "serverHandler.ServeHTTP",
        "library_frame": bool(true),
        "module":        "net/http",
        "filename":      "server.go",
        "lineno":        float64(2741),
        "abs_path":      "/home/andrew/tools/go/1.11.5/src/net/http/server.go",
    },
    map[string]interface {}{
        "abs_path":      "/home/andrew/tools/go/1.11.5/src/net/http/server.go",
        "function":      "(*conn).serve",
        "library_frame": bool(true),
        "module":        "net/http",
        "filename":      "server.go",
        "lineno":        float64(1847),
    },
    map[string]interface {}{
        "filename":      "asm_amd64.s",
        "lineno":        float64(1333),
        "abs_path":      "/home/andrew/tools/go/1.11.5/src/runtime/asm_amd64.s",
        "function":      "goexit",
        "library_frame": bool(true),
        "module":        "runtime",
    },
},

Hypothetically, if

  • the server had access to the source, and were able to classify the frames as being library frames or not, and
  • the first "Span.End" frame were elided
  • we don't care about the absolute path

then, because the line "srv.go:16" can provably have only one call graph, then we could reduce this to something like:

    map[string]interface {}{
        "filename": "srv.go",
        "lineno":   float64(16),
        "function": "main.func1",
        "module":   "main",
    },

(We could potentially also drop "function", if the server were capable of analysing or querying some other service about source code.)

Moreover, using source-to-source transformation, the source location information could be encoded into the span details. This would avoid an expensive call to acquire the stack trace, and would enable us to report the stack trace for all spans and eliminate the min-duration configuration.

Describe alternatives you've considered

Status quo.

axw avatar Feb 21 '19 10:02 axw

This could also improve CPU usage on the server side, as former investigations indicated that parsing and processing a large number of stacktrace frames is very CPU heavy.

simitt avatar Feb 28 '19 10:02 simitt

Is that because of the schema validation or JSON deserialization/serialization?

felixbarny avatar Feb 28 '19 15:02 felixbarny

I don't have an answer on that yet, as I saw it on stress tests and not on benchmark tests listing individual code parts. But I also don't think that is the main question for this Issue, just wanted to note, that this could also be a win for the server.

simitt avatar Feb 28 '19 15:02 simitt

I can't see this happening in the foreseeable future.

axw avatar Oct 17 '22 06:10 axw