couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

OpenTracing being sunset for OpenTelemetry

Open tsloughter opened this issue 4 years ago • 12 comments

Summary

I've been reaching out over the past few months to all maintainers of OpenTracing libraries in Erlang or Elixir. I learned about https://github.com/apache/couchdb/tree/prototype/fdb-layer/src/ctrace today and figured I'd do the same even though it is built into couchdb.

Desired Behaviour

Since OpenTracing is being sunset and OpenTelemetry (the merger of OpenTracing and OpenCensus) is its replacement it is likely desirable to users of instrumented libraries that they eventually move to OpenTelemetry, https://opentelemetry.io/

Possible Solution

There is an official OpenTelemetry Erlang API and SDK (https://github.com/open-telemetry/opentelemetry-erlang-api/ and https://github.com/open-telemetry/opentelemetry-erlang/), along with a SIG to discus implementation. Before today the SIG was only meeting once a month as part of the Erlang Ecosystem Foundation's Observability Working Group, but it will be being updated to meet weekly.

It is certainly not urgent or a must even if OpenTracing the spec is being deprecated, your tracing lib and reporting to Jaeger will continue to work for the foreseeable future -- and I see you have special couchdb specific functionality builtin, so not like you could simply drop in a new lib anyway. I just wanted to reach out like I have to all the others in hopes of collaboration on the future standard of open distributed tracing, metrics and logging in Erlang/Elixir.

tsloughter avatar Jun 09 '20 14:06 tsloughter

@iilyak what do you think?

wohali avatar Jun 09 '20 14:06 wohali

Few points which would probably make https://github.com/open-telemetry/opentelemetry-erlang-api unsuitable for us:

  • it looks like it uses gen_server based backend for storing spans. This would be prohibitively slow for our case.
  • it uses persistent_term feature which was introduced in OTP 21.2.
  • Requires OTP 21.3 or above.
  • the https://github.com/opentelemetry-beam/opentelemetry_exporter uses chatterbox and we know it is buggy https://github.com/joedevivo/chatterbox/issues/136 (we had to remove GRPC support for that reason from couch_eval and ateles)
    • there might be a way to disable GRPC by using http_protobuf configuration option
  • the https://github.com/opentelemetry-beam/opentelemetry_exporter making assumption about which http client to use (it is using httpc) and we might want to plug ibrowse. I am not sure if it is currently possible.

iilyak avatar Jun 09 '20 16:06 iilyak

It looks like the author of the library we currently use abandoning it. It is really sad because it is wonderfully written with performance in mind.

iilyak avatar Jun 09 '20 16:06 iilyak

No, spans are not stored in a gen_servers state, they are stored in an ETS table while the span context is propagated within a process through the pdict.

Yes, persistent term is currently used without the ability to disable it. I don't know what couch's OTP back support is but there is always the option of modifying otel to use ETS when persistent_term isn't available, or just not moving to otel until the oldest OTP supported by couch is 21.3.

The opentelemetry_exporter lib does not require use of GRPC, though I also don't think grpcbox is an issue in this use case because of the lack of concurrency in the exporter. But it supports using plain http 1.1 as well. httpc is the only option at the moment simply because that was the quickest one to add support for. There is no plan to require httpc or any particular client, it is simply a matter of resources and priorities.

I'm sad to hear you dropped grpc and use of grpcbox because of this stream creation bug. I'm glad you linked to that issue on chatterbox (I maintain chatterbox as well) and will take a look at the issue soon, it must have slipped by me in email notifications and I hadn't manually looked at chatterbox's issues in a long time.

I wish I'd known couchdb was trying to use grpcbox/chatterbox. I would have made an effort to take care of these issues with you.

I believe grpcbox is mostly used for the server component by those using it in production. The client has been part that I've wanted to rewrite, and have started the rewrite recently. Though this does look like a chatterbox internal bug. Again an issue of resources and priorities.

tsloughter avatar Jun 09 '20 17:06 tsloughter

@tsloughter We currently support 20.3 as our base, with "soft" support back to 19.latest. We're not planning on making 21.3 a minimum version for a while yet, so that's probably a hard stop for us for at least a year.

wohali avatar Jun 09 '20 17:06 wohali

We can have our own persistent_term like module and patch 3 lines in opentelemetry. However there might be other incompatibilities to uncover.

iilyak avatar Aug 25 '20 20:08 iilyak

The persistent_term is quite easy (about 150 LOC) to re-implement using merl.

iilyak avatar Aug 25 '20 20:08 iilyak

The nice thing about OpenTelemetry is that it has some integration with seqtrace. Although I hope it is optional since we use sensitive flags on processes which handle http request. Which makes erlang tracing impossible. Currently it seems it relies on old erlang tracing API. Possibly due to impossibility of correlating call and replies in new tracing API as mentioned here https://github.com/census-instrumentation/opencensus-erlang/issues/72#issuecomment-381511413. Just for the reference the rabbitmq/looking_glass NIF seem to have the same problem.

iilyak avatar Aug 25 '20 20:08 iilyak

If we would find a way to use erlang tracing safely (without leaking sensitive data) we would be able to debug and trace system on demand even if we don't instrument the codebase using with_span.

iilyak avatar Aug 25 '20 20:08 iilyak

OpenTelemetry doesn't integrate with seqtrace. Originally I did include an implementation of ctx with seqtrace but removed it because I think how it functions is too confusing.

tsloughter avatar Aug 25 '20 22:08 tsloughter

Just saw @dch mention a new CouchDB release on irc and it reminded me to look back at this thread. I thought I'd just mention some development on the OpenTelemetry side, first being that the tracing spec 1.0 has been released, https://medium.com/opentelemetry/opentelemetry-specification-v1-0-0-tracing-edition-72dd08936978

The Erlang implementation has been stuck at an RC release for months but that will be changing soon with a 1.0 release in the coming weeks.

It has also been over a year since this was opened so maybe support for pre-21 Erlang is considered being dropped now?

Also all development is now done under the repo https://github.com/open-telemetry/opentelemetry-erlang/ (plus https://github.com/open-telemetry/opentelemetry-erlang-contrib where instrumentation for other libraries are kept) and the old opentelemetry-erlang-api repo is archived.

tsloughter avatar Sep 30 '21 16:09 tsloughter

@tsloughter that's great to hear about OpenTelemetry, thanks for the update!

The new CouchDB 3.2 release still support Erlang and is already in the final release candidate phase. However, by the next major release it's highly likely we'd bump the minimum Erlang version to be >= 21, so OpenTelemetry would become easier to integrate.

nickva avatar Sep 30 '21 19:09 nickva

Ran into CouchDB metrics somewhere today and it made me think of this issue, so thought I'd pop an update here. I see in dev the required OTP version is now OTP 23+. Also, 1.0 of Erlang/Elixir Opentelemetry was released back in February https://medium.com/opentelemetry/opentelemetry-erlang-elixir-javascript-and-ruby-v1-0-3a0c32e0add4

tsloughter avatar Nov 10 '22 20:11 tsloughter

@tsloughter thanks for stopping by and for the reminder! Yeah, with Erlang 23 as the minimum version it should make it easier to integrate.

nickva avatar Nov 10 '22 22:11 nickva