esp-v2 icon indicating copy to clipboard operation
esp-v2 copied to clipboard

Prevent creating traces for the `--healthz` path

Open nareddyt opened this issue 5 years ago • 11 comments

Originally posted by @clehene in https://github.com/GoogleCloudPlatform/esp-v2/issues/412#issuecomment-723308740

@nareddyt thank you, that did it. Although the data gets flooded by ingress ESPv2_Autogenerated_HealthCheck now. Is there a way to get only these sampled?

nareddyt avatar Nov 06 '20 21:11 nareddyt

Users can specify a health check endpoint in ESPv2 using the --healthz flag.

  • Documentation.
  • Example: --healthz check will generate a method GET /check with tracing operation name ingress ESPv2_Autogenerated_HealthCheck.

Some users would like to prevent these health check operations from being traced. This makes sense, there is little value added from these traces.

Unfortunately, we have no good way to disable this yet. We are blocked on an issue with Envoy's opencensus tracer (https://github.com/envoyproxy/envoy/issues/7422). I will investigate further.

nareddyt avatar Nov 06 '20 21:11 nareddyt

I would suggest to do the following (simple solution, workaround):

  1. Implement a new Sampler (https://github.com/census-instrumentation/opencensus-cpp/blob/master/opencensus/trace/sampler.h) that is working as a "wrapper" and has a config to "disable" tracing based on span name:
// Always samples.
class ExcludeNamesSampler final : public Sampler {
 public:
  ExcludeNamesSampler(vector<std::string> excluded_names, std::unique_ptr<Sampler> delegate);

  bool ShouldSample(const SpanContext* parent_context,
                    bool has_remote_parent,
                    const TraceId& trace_id,
                    const SpanId& span_id,
                    absl::string_view name,
                    const std::vector<Span*>& parent_links) const override {
    for(int i=0; i < excluded_names_.size(); i++){
      if excluded_names_[i] == name {
        return false
      }
    }
    return delegate_->ShouldSample(parent_context, has_remote_parent, trace_id, span_id, name, parent_links);
  }
 private:
  const vector<std::string> excluded_names_
  const std::unique_ptr<Sampler> delegate_
};
  1. Then change envoy to allow setting an array of "span names" to exclude in the OpenCensusTraceConfig, and use this sampler to achieve that.

This way is a much simpler solution, and can be easily achieved, it only works if the name is not changed which means Span::SetName() is not called in envoy, which I think it is based on https://github.com/census-instrumentation/opencensus-cpp/pull/335.

If you can make sure the span name is not updated in the envoy codebase and spans are created with the right name, then this solution works nicely, otherwise the sampler will run against the name passed in the ctor not the updated name.

bogdandrutu avatar Nov 07 '20 00:11 bogdandrutu

So first look if it is possible to remove all the update span name calls, if that is an option we should be fine.

bogdandrutu avatar Nov 07 '20 00:11 bogdandrutu

Thanks, creating a wrapper sampler makes sense. I'll take a look into if/when Envoy calls update and if it can be removed.

nareddyt avatar Nov 07 '20 00:11 nareddyt

Looks like there is a "Driver" https://github.com/envoyproxy/envoy/blob/9ad7d4ce5923fb053506cb76d47f93049a509a4c/include/envoy/tracing/http_tracer.h#L182 which can be implemented that has the name, maybe opencensus can implement that if it works, then we have the name at the start.

Not sure about envoy tracing, but if you do this cc me, since we also want in opentelemetry ideal to not call set_operation if possible :)

bogdandrutu avatar Nov 07 '20 01:11 bogdandrutu

Unfortunately "SetName" is called here

it is called here

Trace name is changed in the middle.

So the ExcludeNameSampler will not work

qiwzhang avatar Nov 07 '20 01:11 qiwzhang

@qiwzhang I think I know where is called in the code (also you can see that I was also looking for alternatives and ideas) :) that's why I said that you may be able to avoid that with some work in envoy.

bogdandrutu avatar Nov 07 '20 20:11 bogdandrutu

The problem is that when HCM config is created, the tracing operation name is decided as ingress for all requests. So every request will have the same name when the Span is created.

Here is where the tracing config is created, hardcoding in the ingress operation name:

https://github.com/envoyproxy/envoy/blob/22683a0a24ffbb0cdeb4111eec5ec90246bec9cb/source/extensions/filters/network/http_connection_manager/config.cc#L369

Here is where the span is created in HCM, using the previous tracing config:

https://github.com/envoyproxy/envoy/blob/a68755d734e30c8d385b92a1c569bd9cceacc5ff/source/common/http/conn_manager_impl.cc#L1046

The span name is later updated based on the operation we set in the router decoration config for the route:

https://github.com/envoyproxy/envoy/blob/a68755d734e30c8d385b92a1c569bd9cceacc5ff/source/common/http/conn_manager_impl.cc#L1072

(Note it is also updated based on the incoming decoration request header, but we don't care about this, we don't use this)

https://github.com/envoyproxy/envoy/blob/a68755d734e30c8d385b92a1c569bd9cceacc5ff/source/common/http/conn_manager_impl.cc#L1085

This are some possibilities to workaround this issue, but I need to investigate more. The first one is preferred if doable, i think the second one is definitely doable.

  1. Is it possible the decorated operation name can be used when the HCM is created? I'm still a little confused how the HCM and router are intertwined, but it might be doable because traceRequest is called at the end of decodeHeaders AND the route is cached before tracing occurs. I don't see any limitation for why we can't just construct the span with the decorated operation name, but there might be some context i'm missing.
  2. We can introduce an option for HCM to name the span with the request path name. I know this is an anti-pattern in tracing, but it would help solve our problem. If each span is named ingress /request_path, the name-based sampler would work well. And the name can later be updated via the decoration config, preventing the final exported span from being named with the path. This can be a non-default option to set the span name as the request path in the HCM config. This should be doable because traceRequest is called at the end of decodeHeaders, so we have access to the :path header there.

nareddyt avatar Nov 09 '20 17:11 nareddyt

Hi Teju, it may work, but this approach is still a hack; you need to have a stable name and you need a ExcludeNameSampler.

Hi @bogdandrutu the right approach is to disable sampling after span is created for OpenSensus. What are the key issues with that?

qiwzhang avatar Nov 09 '20 17:11 qiwzhang

hey guys, just to revive this thread, we also have a gRPC method where we are health checking, in accordance with the health checking protocol which has early support with GCP.

we would love to suppress traces for these, because we get a ton of them, and our options for suppressing them are limited:

  • we can disable tracing, but then we can't have tracing for the methods we care about
  • we can set a low probability, but that also affects the methods we care about, and it's already low (defaults to 0.1 iirc)
  • we can just do health checking directly, rather than through ESP, but then we've got no guarantee ESP is up and failover can't occur automatically.

i haven't found a way yet to either suppress traces for certain methods or paths in cloud trace, or withhold tracing from ESP for specific methods, but perhaps i missed it? this would be very useful if it doesn't exist today

cc / @qiwzhang @nareddyt

sgammon avatar Aug 04 '21 00:08 sgammon

Agreed it is a useful feature. I previously investigated and found there are some blockers in the underlying tracing library ESPv2/Envoy use (OpenCensus). This will likely be continue being blocked until Envoy fully supports OpenTelemetry, could take many months. We have deprioritized it.

nareddyt avatar Aug 04 '21 13:08 nareddyt