couchdb
couchdb copied to clipboard
OpenTracing being sunset for OpenTelemetry
Summary
I've been reaching out over the past few months to all maintainers of OpenTracing libraries in Erlang or Elixir. I learned about https://github.com/apache/couchdb/tree/prototype/fdb-layer/src/ctrace today and figured I'd do the same even though it is built into couchdb.
Desired Behaviour
Since OpenTracing is being sunset and OpenTelemetry (the merger of OpenTracing and OpenCensus) is its replacement it is likely desirable to users of instrumented libraries that they eventually move to OpenTelemetry, https://opentelemetry.io/
Possible Solution
There is an official OpenTelemetry Erlang API and SDK (https://github.com/open-telemetry/opentelemetry-erlang-api/ and https://github.com/open-telemetry/opentelemetry-erlang/), along with a SIG to discus implementation. Before today the SIG was only meeting once a month as part of the Erlang Ecosystem Foundation's Observability Working Group, but it will be being updated to meet weekly.
It is certainly not urgent or a must even if OpenTracing the spec is being deprecated, your tracing lib and reporting to Jaeger will continue to work for the foreseeable future -- and I see you have special couchdb specific functionality builtin, so not like you could simply drop in a new lib anyway. I just wanted to reach out like I have to all the others in hopes of collaboration on the future standard of open distributed tracing, metrics and logging in Erlang/Elixir.
@iilyak what do you think?
Few points which would probably make https://github.com/open-telemetry/opentelemetry-erlang-api unsuitable for us:
- it looks like it uses
gen_server
based backend for storing spans. This would be prohibitively slow for our case. - it uses
persistent_term
feature which was introduced in OTP 21.2. - Requires OTP 21.3 or above.
- the https://github.com/opentelemetry-beam/opentelemetry_exporter uses chatterbox and we know it is buggy https://github.com/joedevivo/chatterbox/issues/136 (we had to remove GRPC support for that reason from
couch_eval
andateles
)- there might be a way to disable GRPC by using
http_protobuf
configuration option
- there might be a way to disable GRPC by using
- the https://github.com/opentelemetry-beam/opentelemetry_exporter making assumption about which http client to use (it is using httpc) and we might want to plug ibrowse. I am not sure if it is currently possible.
It looks like the author of the library we currently use abandoning it. It is really sad because it is wonderfully written with performance in mind.
No, spans are not stored in a gen_server
s state, they are stored in an ETS table while the span context is propagated within a process through the pdict.
Yes, persistent term is currently used without the ability to disable it. I don't know what couch's OTP back support is but there is always the option of modifying otel to use ETS when persistent_term isn't available, or just not moving to otel until the oldest OTP supported by couch is 21.3.
The opentelemetry_exporter
lib does not require use of GRPC, though I also don't think grpcbox is an issue in this use case because of the lack of concurrency in the exporter. But it supports using plain http 1.1 as well. httpc
is the only option at the moment simply because that was the quickest one to add support for. There is no plan to require httpc or any particular client, it is simply a matter of resources and priorities.
I'm sad to hear you dropped grpc and use of grpcbox because of this stream creation bug. I'm glad you linked to that issue on chatterbox (I maintain chatterbox as well) and will take a look at the issue soon, it must have slipped by me in email notifications and I hadn't manually looked at chatterbox's issues in a long time.
I wish I'd known couchdb was trying to use grpcbox/chatterbox. I would have made an effort to take care of these issues with you.
I believe grpcbox is mostly used for the server component by those using it in production. The client has been part that I've wanted to rewrite, and have started the rewrite recently. Though this does look like a chatterbox internal bug. Again an issue of resources and priorities.
@tsloughter We currently support 20.3 as our base, with "soft" support back to 19.latest. We're not planning on making 21.3 a minimum version for a while yet, so that's probably a hard stop for us for at least a year.
We can have our own persistent_term
like module and patch 3 lines in opentelemetry. However there might be other incompatibilities to uncover.
The persistent_term
is quite easy (about 150 LOC) to re-implement using merl
.
The nice thing about OpenTelemetry is that it has some integration with seqtrace
. Although I hope it is optional since we use sensitive flags on processes which handle http request. Which makes erlang tracing impossible. Currently it seems it relies on old erlang tracing API. Possibly due to impossibility of correlating call and replies in new tracing API as mentioned here https://github.com/census-instrumentation/opencensus-erlang/issues/72#issuecomment-381511413. Just for the reference the rabbitmq/looking_glass NIF seem to have the same problem.
If we would find a way to use erlang tracing safely (without leaking sensitive data) we would be able to debug and trace system on demand even if we don't instrument the codebase using with_span
.
OpenTelemetry doesn't integrate with seqtrace. Originally I did include an implementation of ctx with seqtrace but removed it because I think how it functions is too confusing.
Just saw @dch mention a new CouchDB release on irc and it reminded me to look back at this thread. I thought I'd just mention some development on the OpenTelemetry side, first being that the tracing spec 1.0 has been released, https://medium.com/opentelemetry/opentelemetry-specification-v1-0-0-tracing-edition-72dd08936978
The Erlang implementation has been stuck at an RC release for months but that will be changing soon with a 1.0 release in the coming weeks.
It has also been over a year since this was opened so maybe support for pre-21 Erlang is considered being dropped now?
Also all development is now done under the repo https://github.com/open-telemetry/opentelemetry-erlang/ (plus https://github.com/open-telemetry/opentelemetry-erlang-contrib where instrumentation for other libraries are kept) and the old opentelemetry-erlang-api
repo is archived.
@tsloughter that's great to hear about OpenTelemetry, thanks for the update!
The new CouchDB 3.2 release still support Erlang and is already in the final release candidate phase. However, by the next major release it's highly likely we'd bump the minimum Erlang version to be >= 21, so OpenTelemetry would become easier to integrate.
Ran into CouchDB metrics somewhere today and it made me think of this issue, so thought I'd pop an update here. I see in dev the required OTP version is now OTP 23+. Also, 1.0 of Erlang/Elixir Opentelemetry was released back in February https://medium.com/opentelemetry/opentelemetry-erlang-elixir-javascript-and-ruby-v1-0-3a0c32e0add4
@tsloughter thanks for stopping by and for the reminder! Yeah, with Erlang 23 as the minimum version it should make it easier to integrate.