opencensus-service icon indicating copy to clipboard operation
opencensus-service copied to clipboard

Flaky connection while directly exporting to collector from ocagent exporter

Open asutoshpalai opened this issue 5 years ago • 3 comments

When using the Agent exporter with OpenCensus to directly export to the Collector, instead of exporting to Agent and then to Collector, the connection keeps resetting.

Is this the intended behavior? If so, is there a way/config to maintain stable connection?

As per the blog and design doc, it looks like both the Agent and Collector are optional and we should be able to export directly to the collector.

The bug reproduction

I modified example/main.go to enable debug logs from gRPC as follows:

diff --git a/example/main.go b/example/main.go
index 5fa9f5f..e7932df 100644
--- a/example/main.go
+++ b/example/main.go
@@ -24,13 +24,17 @@ import (
 	"time"
 
 	"contrib.go.opencensus.io/exporter/ocagent"
+	"github.com/sirupsen/logrus"
 	"go.opencensus.io/stats"
 	"go.opencensus.io/stats/view"
 	"go.opencensus.io/tag"
 	"go.opencensus.io/trace"
+	"google.golang.org/grpc/grpclog"
 )
 
 func main() {
+	logrus.SetLevel(logrus.DebugLevel)
+	grpclog.SetLogger(logrus.New())
 	oce, err := ocagent.NewExporter(
 		ocagent.WithInsecure(),
 		ocagent.WithServiceName(fmt.Sprintf("example-go-%d", os.Getpid())))
@@ -119,5 +123,6 @@ func main() {
 		}
 		stats.Record(ctx, mLatencyMs.M(latencyMs))
 		fmt.Printf("Latency: %.3fms\n", latencyMs)
+		oce.Flush()
 	}
 }

My Agent config:

receivers:
  opencensus:
    address: ":55678"

exporters:
  opencensus:
    endpoint: "localhost:55680"

zpages:
  port: 8884

My Collector config:

log-level: DEBUG
receivers:
  opencensus:
    port: 55680

queued-exporters:
  jaeger-all-in-one:
    num-workers: 4
    queue-size: 100
    retry-on-failure: true
    sender-type: jaeger-thrift-http
    jaeger-thrift-http:
      collector-endpoint: http://localhost:14268/api/traces
      timeout: 5s

zpages:
  port: 8889

When all three are run, we get

INFO[0000] pickfirstBalancer: HandleSubConnStateChange: 0xc000020290, CONNECTING 
INFO[0000] pickfirstBalancer: HandleSubConnStateChange: 0xc000020290, READY 

only once in the logs of example/main.go.

But if we don't run the agent and export directly to Collector (by changing the port in the Collector's config), we get the above the above logs multiple times.

asutoshpalai avatar Jun 18 '19 01:06 asutoshpalai

Thanks for reporting @asutoshpalai. This is due to the collector not implementing the metrics endpoint: the example periodically tries to send metric data and that resets the connection. The agent on the other hand implements the metrics endpoint and the reset doesn't happen. What happens if you remove the metrics from the example and go straight to the collector? Is that an option for you? That said it is a bug anyway...

pjanotti avatar Jun 18 '19 01:06 pjanotti

Thanks @pjanotti, that's correct! When I didn't register the exporter with view, everything worked fine. It's good enough for me, but I will leave this issue open if you are looking to fix this in future.

asutoshpalai avatar Jun 18 '19 17:06 asutoshpalai

Thanks for confirming @asutoshpalai - yes, this is a bug that needs to be fixed. Leaving the issue open.

pjanotti avatar Jun 18 '19 17:06 pjanotti