opencensus-specs icon indicating copy to clipboard operation
opencensus-specs copied to clipboard

Don't trace health endpoints

Open rakyll opened this issue 5 years ago • 16 comments

Tracing canonical HTTP health endpoints such as /healthz and /_ah/health generate lots of traces. This results in additional high cost when collecting and storing spans for these endpoints, as well as noise at visualizing time.

HTTP tracing integrations should by default disable the tracing of:

  • /healthz
  • /_ah/health

Other canonical health endpoints can later be added to the list.

rakyll avatar Jul 25 '18 03:07 rakyll

Is there any gRPC equivalent?

semistrict avatar Jul 25 '18 17:07 semistrict

See gRPC's canonical health checking reference: https://github.com/grpc/grpc/blob/master/doc/health-checking.md.

rakyll avatar Jul 25 '18 20:07 rakyll

Did we end up implementing this or is it still under discussion. If it's still under discussion, then I suspect that the simplest method of achieving this is through a blacklist that's maintained on GitHub and contained within builds of the library. If certain URLs / methods need to be excluded by everyone, they can be added in GitHub through a PR, and if users need to blacklist additional URLs / methods specific to their app, they can do do via an API or config.

Thoughts?

mtwo avatar Aug 13 '18 23:08 mtwo

I don't like the idea of just blocking some arbitrary selection of paths. I think that if we had a better default sampler (per #156) users wouldn't resort to enabling 100% sampling, which is problematic for any production traffic - not just health checking.

semistrict avatar Aug 14 '18 00:08 semistrict

I see your point, but feel like there's still a need to remove certain endpoints. For example, seeing traces of Profiler requests to Stackdriver drive me nuts: they're high latency and throw off my views, and I really don't care about their performance. We also can't expect the libraries that generate these requests to integrate with OpenCensus with the sole purpose of excluding themselves from sampling.

mtwo avatar Aug 14 '18 17:08 mtwo

@mtwo

For example, seeing traces of Profiler requests to Stackdriver drive me nuts: they're high latency and throw off my views, and I really don't care about their performance.

This won't be helped by resolving this issue as proposed. This issue is about avoiding server spans related to health checking. What you're talking about are client spans (and unrelated to health checking). I think we should have a separate issue for that.

semistrict avatar Aug 14 '18 17:08 semistrict

That's right, sorry! I'll create a separate issue

mtwo avatar Aug 14 '18 17:08 mtwo

Should there be a common solution? One solution I can think of is to have a special tag for SDK internal processing code. So exporter may decide to throw away anything with that tag or attribute. Like "SyntheticSource". This pattern can be used later for calls from availability ping tests.

SergeyKanzhelev avatar Sep 25 '18 06:09 SergeyKanzhelev

We are also considering a transport-based request influenced sampling policy setting. This allows us to implement filtering mechanisms by HTTP path, RPC name, etc.

The current HTTP-specific spec change PR: https://github.com/census-instrumentation/opencensus-specs/pull/182

rakyll avatar Sep 25 '18 18:09 rakyll

I think we can close this issue by suggesting library implementations to provide filtering options based on https://github.com/census-instrumentation/opencensus-specs/pull/182.

rakyll avatar Oct 08 '18 18:10 rakyll

Is this closed or is there a way to filter traces by sampling options in Go? I couldn't find any.

We're using opencensus on AWS and the ELB healthchecks are causing a lot of traces. In addition they just go to / so having something configurable would be ideal, possibly even by user-agent since it's set to ELB-HealthChecker/2.0.

montanaflynn avatar Apr 03 '19 10:04 montanaflynn

Actually I just figured it out using GetStartOptions:

&ochttp.Handler{
	Handler: handler,
	GetStartOptions: func(r *http.Request) trace.StartOptions {
		startOptions := trace.StartOptions{}
		if r.UserAgent() == "ELB-HealthChecker/2.0" {
			startOptions.Sampler = trace.NeverSample()
		}
		return startOptions
	},
},

montanaflynn avatar Apr 03 '19 11:04 montanaflynn

Thanks to @montanaflynn

// SkipedSampleAPIs skip tracing sample data for these apis
var SkipedSampleAPIs = map[string]bool{
	"/readyz":   true,
	"/metricsz": true,
	"/healthz":  true,
}	

handler = &ochttp.Handler{
		Handler: handler,
		GetStartOptions: func(r *http.Request) trace.StartOptions {
			startOptions := trace.StartOptions{}
			if SkipedSampleAPIs[r.URL.Path] {
				startOptions.Sampler = trace.NeverSample()
			}
			return startOptions
		},
	}

hixichen avatar Aug 13 '19 23:08 hixichen

Sadly this won't help in gRPC server as the ocgrpc plugin does not allow for GetStartOptions function to handle per request tracing. 😞

lunemec avatar Jan 02 '20 14:01 lunemec

Now, it should be more easier and correct to skip private/health endpoints with IsHealthEndpoint callback. It will completely skip the trace.

// SkipedSampleAPIs skip tracing sample data for these apis
var SkipedSampleAPIs = map[string]bool{
	"/readyz":   true,
	"/metricsz": true,
	"/healthz":  true,
}	

handler = &ochttp.Handler{
		Handler: handler,
		IsHealthEndpoint: func(r *http.Request) bool {
			return skipedSampleAPIs[r.URL.Path]
		},
	}

rhzs avatar Sep 24 '20 17:09 rhzs

It would still be great to have a standard option to exclude the grpc.health.* RPCs.

In the meantime, it is possible to use a custom sampler with ocgrpc, just a little more cumbersome than the IsHealthEndpoint option provided with ochttp.

sampler := func (fraction float64) trace.Sampler {
    ps := trace.ProbabilitySampler(fraction)
    return func(params trace.SamplingParameters) trace.SamplingDecision {
        if strings.HasPrefix(params.Name, "grpc.health") {
            return trace.SamplingDecision{Sample: false}
        }
        return ps(params)
    }
}(0.25) // <- sample rate from environment, config, etc.

options := trace.StartOptions{Sampler: sampler}
handler := ocgrpc.ServerHandler{StartOptions: options}
server  := grpc.NewServer(grpc.StatsHandler(&handler))

0x726d77 avatar Oct 30 '20 02:10 0x726d77