Prevent creating traces for the `--healthz` path
Originally posted by @clehene in https://github.com/GoogleCloudPlatform/esp-v2/issues/412#issuecomment-723308740
@nareddyt thank you, that did it. Although the data gets flooded by
ingress ESPv2_Autogenerated_HealthChecknow. Is there a way to get only these sampled?
Users can specify a health check endpoint in ESPv2 using the --healthz flag.
- Documentation.
- Example:
--healthz checkwill generate a methodGET /checkwith tracing operation nameingress ESPv2_Autogenerated_HealthCheck.
Some users would like to prevent these health check operations from being traced. This makes sense, there is little value added from these traces.
Unfortunately, we have no good way to disable this yet. We are blocked on an issue with Envoy's opencensus tracer (https://github.com/envoyproxy/envoy/issues/7422). I will investigate further.
I would suggest to do the following (simple solution, workaround):
- Implement a new Sampler (https://github.com/census-instrumentation/opencensus-cpp/blob/master/opencensus/trace/sampler.h) that is working as a "wrapper" and has a config to "disable" tracing based on span name:
// Always samples.
class ExcludeNamesSampler final : public Sampler {
public:
ExcludeNamesSampler(vector<std::string> excluded_names, std::unique_ptr<Sampler> delegate);
bool ShouldSample(const SpanContext* parent_context,
bool has_remote_parent,
const TraceId& trace_id,
const SpanId& span_id,
absl::string_view name,
const std::vector<Span*>& parent_links) const override {
for(int i=0; i < excluded_names_.size(); i++){
if excluded_names_[i] == name {
return false
}
}
return delegate_->ShouldSample(parent_context, has_remote_parent, trace_id, span_id, name, parent_links);
}
private:
const vector<std::string> excluded_names_
const std::unique_ptr<Sampler> delegate_
};
- Then change envoy to allow setting an array of "span names" to exclude in the OpenCensusTraceConfig, and use this sampler to achieve that.
This way is a much simpler solution, and can be easily achieved, it only works if the name is not changed which means Span::SetName() is not called in envoy, which I think it is based on https://github.com/census-instrumentation/opencensus-cpp/pull/335.
If you can make sure the span name is not updated in the envoy codebase and spans are created with the right name, then this solution works nicely, otherwise the sampler will run against the name passed in the ctor not the updated name.
So first look if it is possible to remove all the update span name calls, if that is an option we should be fine.
Thanks, creating a wrapper sampler makes sense. I'll take a look into if/when Envoy calls update and if it can be removed.
Looks like there is a "Driver" https://github.com/envoyproxy/envoy/blob/9ad7d4ce5923fb053506cb76d47f93049a509a4c/include/envoy/tracing/http_tracer.h#L182 which can be implemented that has the name, maybe opencensus can implement that if it works, then we have the name at the start.
Not sure about envoy tracing, but if you do this cc me, since we also want in opentelemetry ideal to not call set_operation if possible :)
Unfortunately "SetName" is called here
it is called here
Trace name is changed in the middle.
So the ExcludeNameSampler will not work
@qiwzhang I think I know where is called in the code (also you can see that I was also looking for alternatives and ideas) :) that's why I said that you may be able to avoid that with some work in envoy.
The problem is that when HCM config is created, the tracing operation name is decided as ingress for all requests. So every request will have the same name when the Span is created.
Here is where the tracing config is created, hardcoding in the ingress operation name:
https://github.com/envoyproxy/envoy/blob/22683a0a24ffbb0cdeb4111eec5ec90246bec9cb/source/extensions/filters/network/http_connection_manager/config.cc#L369
Here is where the span is created in HCM, using the previous tracing config:
https://github.com/envoyproxy/envoy/blob/a68755d734e30c8d385b92a1c569bd9cceacc5ff/source/common/http/conn_manager_impl.cc#L1046
The span name is later updated based on the operation we set in the router decoration config for the route:
https://github.com/envoyproxy/envoy/blob/a68755d734e30c8d385b92a1c569bd9cceacc5ff/source/common/http/conn_manager_impl.cc#L1072
(Note it is also updated based on the incoming decoration request header, but we don't care about this, we don't use this)
https://github.com/envoyproxy/envoy/blob/a68755d734e30c8d385b92a1c569bd9cceacc5ff/source/common/http/conn_manager_impl.cc#L1085
This are some possibilities to workaround this issue, but I need to investigate more. The first one is preferred if doable, i think the second one is definitely doable.
- Is it possible the decorated operation name can be used when the HCM is created? I'm still a little confused how the HCM and router are intertwined, but it might be doable because traceRequest is called at the end of decodeHeaders AND the route is cached before tracing occurs. I don't see any limitation for why we can't just construct the span with the decorated operation name, but there might be some context i'm missing.
- We can introduce an option for HCM to name the span with the request path name. I know this is an anti-pattern in tracing, but it would help solve our problem. If each span is named
ingress /request_path, the name-based sampler would work well. And the name can later be updated via the decoration config, preventing the final exported span from being named with the path. This can be a non-default option to set the span name as the request path in the HCM config. This should be doable because traceRequest is called at the end of decodeHeaders, so we have access to the:pathheader there.
Hi Teju, it may work, but this approach is still a hack; you need to have a stable name and you need a ExcludeNameSampler.
Hi @bogdandrutu the right approach is to disable sampling after span is created for OpenSensus. What are the key issues with that?
hey guys, just to revive this thread, we also have a gRPC method where we are health checking, in accordance with the health checking protocol which has early support with GCP.
we would love to suppress traces for these, because we get a ton of them, and our options for suppressing them are limited:
- we can disable tracing, but then we can't have tracing for the methods we care about
- we can set a low probability, but that also affects the methods we care about, and it's already low (defaults to
0.1iirc) - we can just do health checking directly, rather than through ESP, but then we've got no guarantee ESP is up and failover can't occur automatically.
i haven't found a way yet to either suppress traces for certain methods or paths in cloud trace, or withhold tracing from ESP for specific methods, but perhaps i missed it? this would be very useful if it doesn't exist today
cc / @qiwzhang @nareddyt
Agreed it is a useful feature. I previously investigated and found there are some blockers in the underlying tracing library ESPv2/Envoy use (OpenCensus). This will likely be continue being blocked until Envoy fully supports OpenTelemetry, could take many months. We have deprioritized it.