zipkin icon indicating copy to clipboard operation
zipkin copied to clipboard

reconsider elasticsearch indexing strategy

Open codefromthecrypt opened this issue 6 years ago • 6 comments

#1685 we changed to use a special manual indexing of elasticsearch to work around dots in tag names (without having to do terrible hacks like renaming dots to underscores).

Meanwhile, numerous folks have asked about having direct indexing work, mostly but not always for Kibana. Most recently @narayaruna

I asked @xeraa about this and he had the following to say:

So the initial implementation was broken (nested object vs property with dot(s) like you said) and was removed in 2.0. 2.0 to 2.3 don’t support dots in field names. For 2.4 it must be explicitly enable [0], but this version added a new and working implementation.

If you set the minimum Elasticsearch version to 5.0, dots in field names should not be a problem. Elastic is not supporting 2.x any more, but I’m not sure what versions Zipkin is targeting.

[0] https://www.elastic.co/guide/en/elasticsearch/reference/2.4/dots-in-names.html#_enabling_support_for_dots_in_field_names [1] https://github.com/elastic/elasticsearch/issues/28948

So, it seems we might be able to re-introduce standard indexing given a version range. I don't expect we can turn off 2.x, but at least we could make supported versions easier.

Thoughts? cc @openzipkin/elasticsearch @zeagord

codefromthecrypt avatar Jun 28 '18 00:06 codefromthecrypt

PS we need to vet carefully make sure we don't rebreak things in the process. i.e. verify that if a tag "error" is sent, followed by "error.code" it doesn't crash like it did before.

Else, we get straight back into playing remapping/escaping things which hurts our ability to see what exactly was sent, resulting in both the problems we had before and also the problem of supporting multiple indexing types!

codefromthecrypt avatar Jun 28 '18 00:06 codefromthecrypt

I don't think we can turn off the 2.x either because I suspect many of them are still using it in production. It will be nice to reintroduce the standard indexing.

zeagord avatar Jun 28 '18 04:06 zeagord

If we can't turn off the 2.X, we could provide a switch via a property for example to use the old workaround ?

ImFlog avatar Jun 28 '18 13:06 ImFlog

definitely there's value exploring what seems to be the more ES way which is not exactly storing raw data (as dots need escaping etc), but still maintenance causing.

if we do decide to do the "dedotting" approach, basically we'd need to have our query side compensate/detect the data written to see if it needs to be unescaped or not. There's code in that, not terrible stuff, but not lovely. We also have a slightly different code path on write based on the version which yeah.. is more code. More code is fine if worth it or with hands.. we just need to figure out if that's the case.

TL;DR; I'd like to see a show of hands of who wants this enough to help :D no pressure.. just a question as we have to be careful entering areas of new "if statements". Would anyone be able to participate in code, advice, testing, documentation or morale support? :D

codefromthecrypt avatar Jun 28 '18 14:06 codefromthecrypt

Sorry, for commenting on an old issue. Just encountered an issue with dots in tag names but won't go into that here.

I wonder if it's worth adopting an escaping strategy? E.g. URL encoding + encode the periods (which isn't the normal way with URL encoding). I.e. replace "." with "%2E". That way you'd always be able to get back the original tag name but you could avoid any issues with names that conflict with Elasticsearch.

msmsimondean avatar Nov 13 '20 18:11 msmsimondean

  1. One common approach (for example in the Beats) is to "dedot" tags — replacing dots with underscores. Though if you have some hierarchy with dots, the underscores might cause confusion, so I would consider it more of a hack than a solution.
  2. Alternatively the flattened field type might be an interesting alternative: It doesn't create a mapping for that field and thus avoids the issue of having a mapping collision with foo and foo.bar (foo cannot be a concrete type and an object at the same time) — @msmsimondean I assume that's the problem you ran into. Otherwise flattened behaves pretty much like a keyword field, so it should be a good match for tags. The downside is that it is not available on AWS ES, since it's a free Basic feature.

xeraa avatar Nov 13 '20 19:11 xeraa