opencensus-service
opencensus-service copied to clipboard
Flaky connection while directly exporting to collector from ocagent exporter
When using the Agent exporter with OpenCensus to directly export to the Collector, instead of exporting to Agent and then to Collector, the connection keeps resetting.
Is this the intended behavior? If so, is there a way/config to maintain stable connection?
As per the blog and design doc, it looks like both the Agent and Collector are optional and we should be able to export directly to the collector.
The bug reproduction
I modified example/main.go to enable debug logs from gRPC as follows:
diff --git a/example/main.go b/example/main.go
index 5fa9f5f..e7932df 100644
--- a/example/main.go
+++ b/example/main.go
@@ -24,13 +24,17 @@ import (
"time"
"contrib.go.opencensus.io/exporter/ocagent"
+ "github.com/sirupsen/logrus"
"go.opencensus.io/stats"
"go.opencensus.io/stats/view"
"go.opencensus.io/tag"
"go.opencensus.io/trace"
+ "google.golang.org/grpc/grpclog"
)
func main() {
+ logrus.SetLevel(logrus.DebugLevel)
+ grpclog.SetLogger(logrus.New())
oce, err := ocagent.NewExporter(
ocagent.WithInsecure(),
ocagent.WithServiceName(fmt.Sprintf("example-go-%d", os.Getpid())))
@@ -119,5 +123,6 @@ func main() {
}
stats.Record(ctx, mLatencyMs.M(latencyMs))
fmt.Printf("Latency: %.3fms\n", latencyMs)
+ oce.Flush()
}
}
My Agent config:
receivers:
opencensus:
address: ":55678"
exporters:
opencensus:
endpoint: "localhost:55680"
zpages:
port: 8884
My Collector config:
log-level: DEBUG
receivers:
opencensus:
port: 55680
queued-exporters:
jaeger-all-in-one:
num-workers: 4
queue-size: 100
retry-on-failure: true
sender-type: jaeger-thrift-http
jaeger-thrift-http:
collector-endpoint: http://localhost:14268/api/traces
timeout: 5s
zpages:
port: 8889
When all three are run, we get
INFO[0000] pickfirstBalancer: HandleSubConnStateChange: 0xc000020290, CONNECTING
INFO[0000] pickfirstBalancer: HandleSubConnStateChange: 0xc000020290, READY
only once in the logs of example/main.go
.
But if we don't run the agent and export directly to Collector (by changing the port in the Collector's config), we get the above the above logs multiple times.
Thanks for reporting @asutoshpalai. This is due to the collector not implementing the metrics endpoint: the example periodically tries to send metric data and that resets the connection. The agent on the other hand implements the metrics endpoint and the reset doesn't happen. What happens if you remove the metrics from the example and go straight to the collector? Is that an option for you? That said it is a bug anyway...
Thanks @pjanotti, that's correct! When I didn't register the exporter with view, everything worked fine. It's good enough for me, but I will leave this issue open if you are looking to fix this in future.
Thanks for confirming @asutoshpalai - yes, this is a bug that needs to be fixed. Leaving the issue open.