jaeger
jaeger copied to clipboard
InfluxDB as trace storage backend
Meta-issue no storage backends: #638
There is some work happening here https://github.com/openzipkin/zipkin/issues/1628
My interest at this time is what features such implementation could provide, i.e.
- what would be write throughput per node with RF=2
- could the backend support indexing of arbitrary tags / log fields, or do they need to be pre-defined
- what is the write amplification or perf impact as a function of # of tags/fields per span
- in Cassandra backend every tag is an extra write
- in ES it's extra indexing time on the server
- will the backend support correct server-side joins and LIMIT (broken with Cassandra today)
- how search with multiple tags would be handled
- in Cassandra it's an AND across different spans from the same service name (weird)
- in ES it's an AND across tags from the same span only (index document is one span)
- could the backend support latency aggregates out of the box (by service/endpoint)? This one is something I'd expect InfluxDB to be able to do easily, since it's fundamentally a TS db
cc @gianarb @goller
From personal experience I've easily done 6 million points per minute to a single node with no issue using the recommended 5000 points per request, however batching is a big key, as your batches get smaller write performance reduces drastically.
In terms of arbitrary tags / log fields, they do not need to be predefined, however fields cannot have a mixed type, so once you set fieldA=int64, fieldA always has to be an int64.
For indexing, tags are always indexed, fields are never indexed. This means that cardinality of tags is a big issue since Influx creates an in-memory index for all tags (might be okay with their new TSI) and any query against a field looking for a specific value causes a scan of the data - this is usually okay since you're generally querying by time span, but something to keep in mind.
Aggregations can be easily implemented with their built in aggregation functions and a groupby service and endpoint
Hi @yurishkuro we'd like to contribute influxdb as a trace support backend. Currently, we are getting experience with writing spans with telegraf into InfluxDB running with the new TSI engine
@jrbury is absolutely correct on all points. The TSI engine is built to handle much higher cardinality. Here is how we define cardinality: https://docs.influxdata.com/influxdb/v1.3/concepts/glossary/#series-cardinality
I believe that the trace id will dominate the cardinality.
Regarding your other questions:
will the backend support correct server-side joins and LIMIT (broken with Cassandra today)
Influx does not have server-side joins per se, but, it is able to group by any number of tags. Additionally, influx has several meta queries using the SHOW
keywords that are used to get information about tag sets. The SELECT
and SHOW
queries both support LIMIT
.
how search with multiple tags would be handled
Multiple tags can be handled with a WHERE
clause. The WHERE
clause would not neeed to be restrictions of a single service name or single span. I believe it should "just work."
could the backend support latency aggregates out of the box (by service/endpoint)? This one is something I'd expect InfluxDB to be able to do easily, since it's fundamentally a TS db
Yes, I believe this should be in our wheelhouse for sure.
So, what do you think of us trying our hand at implementing the store?
As if I could try to stop you!
Seriously though, if you have the cycles and the desire to do this, then by all means. I recommend doing it in some other repo so that you don't have to go through our code reviews until you have a working proof of concept and run some integration and stress tests. Note that we have some integration tests that (in theory) should work across different storage backends - ./plugin/storage/integration/...
@goller just saw this https://github.com/influxdata/jaeger. Just curious - why are you going after zipkin's nomenclature ("binary annotations" etc. ) instead of OpenTracing, given that you're already operating on Jaeger's domain model? It seems like extra work. Note that Jaeger backend can both produce and consume Zipkin model if necessary.
Probably a lack of experience on my part !
On Aug 21, 2017, at 9:39 PM, Yuri Shkuro [email protected] wrote:
@goller just saw this https://github.com/influxdata/jaeger. Just curious - why are you going after zipkin's nomenclature ("binary annotations" etc. ) instead of OpenTracing, given that you're already operating on Jaeger's domain model? It seems like extra work. Note that Jaeger backend can both produce and consume Zipkin model if necessary.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@yurishkuro To better understand zipkin's model, we implemented a telegraf plugin here: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/zipkin
Our goal is to support OpenTracing for sure, but, we figured we would support zipkin's data model to store into influxdb via telegraf. That way both jaeger and zipkin could read data from it.
Do you think it would be better for the collection of spans to be stored using the OpenTracing naming?
Since Chris is new to all this, he should know jury is out on whether there's a data model for opentracing. I'd be careful to pre-emptively label anything as such as it might mislead people or clash with an actual spec.
https://github.com/opentracing/specification/issues/64
IOTW, jaeger definitely wrote their model around naming inside OpenTracing, but that doesn't imply there's any official or stable means to do that. If you model based on jaeger, you are just modeling based on jaeger.
On Tue, Aug 22, 2017 at 12:06 PM, Chris Goller [email protected] wrote:
@yurishkuro https://github.com/yurishkuro To better understand zipkin's model, we implemented a telegraf plugin here: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/zipkin
Our goal is to support OpenTracing for sure, but, we figured we would support zipkin's data model to store into influxdb via telegraf. That way both jaeger and zipkin could read data from it.
Do you think it would be better for the collection of spans to be stored using the OpenTracing naming?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uber/jaeger/issues/272#issuecomment-323913586, or mute the thread https://github.com/notifications/unsubscribe-auth/AAD616JzWT81QScJHw286XLMz0xMdTfiks5salPGgaJpZM4OZF-O .
@goller zipkin model does not support all features of OpenTracing, such as KV-logs and span references. Because of that the transformation from Jaeger to Zipkin data model can be lossy. If you're implementing Jaeger backend with InfluxDB, it seems to make more sense for that backend to use Jaeger data model and not be lossy.
@goller btw jaeger-collector can accept Zipkin spans in various formats at :9411/api/v1/spans. It converts them to Jaeger internal data model that SpanWriter/SpanReader are operating on.
I'm going to stop commenting on this issue. Suffice to say please do not conflate this work with Zipkin as lossiness is a point of view and point in time thing. Yuri's perspective of things is just that. he doesn't represent zipkin.
FYI active work on this issue: https://github.com/influxdata/jaeger/tree/influxdb
Today, this branch works with InfluxDB 2.0 alpha. It works today, but I won't open a PR until we've used it ourselves for a while.
The plugin framework issue: https://github.com/jaegertracing/jaeger/issues/422
FYI we have moved our active work to a new repo, which uses the gRPC framework: https://github.com/influxdata/jaeger-influxdb
@jacobmarble is the repo available? I got 404
Should be public now.
On Fri, May 10, 2019 at 5:06 PM JuanPablo [email protected] wrote:
@jacobmarble https://github.com/jacobmarble is the repo available? I got 404
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/jaegertracing/jaeger/issues/272#issuecomment-491443575, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEX5N3OPZOGNHWWH6QH5LLPUXWXDANCNFSM4DTEL6HA .
-- Jacob
@jacobmarble Did https://github.com/influxdata/jaeger-influxdb get moved to another location? The link from the docs 404s
Looks like the repo is available :-)
@MattBoatman I'm not sure why you got a 404.
Related, that repository will be archived in the next few months, as its replacement stabilizes. A new InfluxDB storage engine is in development, which handle traces much better than the current engine. This new Jaeger plugin is designed around a schema which is friendly to both OpenTelemetry and the new storage engine: https://github.com/influxdata/influxdb-observability/tree/main/jaeger-query-plugin
@jacobmarble I emailed the influx team and they restored the repo ;) Good to know, I was just following the links from the jaeger docs
There is a newer version of this which works with iOx the new engine. https://github.com/influxdata/influxdb-observability/tree/main/jaeger-query-plugin the older repo is only for v1 and v2 of InfluxDB.
Last repo link, I promise: https://github.com/influxdata/influxdb-observability
More specifically: https://github.com/influxdata/influxdb-observability/tree/main/jaeger-influxdb