zipkin icon indicating copy to clipboard operation
zipkin copied to clipboard

Support hourly Elasticsearch indexing

Open libeilin opened this issue 5 years ago • 27 comments

elasticSearch

libeilin avatar Jan 29 '19 11:01 libeilin

This will not work out of the box as some other logic would need to change. we can leave this to see if it is popular or not

codefromthecrypt avatar Jan 29 '19 12:01 codefromthecrypt

OK, thanks for your reply, because we have encountered some problems when compiling by ourselves. Therefore, we are looking for your help here.

If this requirement is made, I hope you can release it as soon as possible. At present, the data volume of one day is too large, and the ES query speed cannot keep up with it.

libeilin avatar Jan 29 '19 12:01 libeilin

@openzipkin/elasticsearch any interest on this?

codefromthecrypt avatar Jan 29 '19 12:01 codefromthecrypt

  1. I assume this can't be easily fixed with alias trickery, right? 2019-01-01 pointing to 2019-01-01-00 and you switch that every hour. As long as the alias is pointing to a single index you can write to it. Pointing to multiple indices makes it read-only.
  2. Probably the better approach long-term is a rollover index where you can specify a certain age or number of docs or size. I'd generally go for size so you have a very even distribution of data per shard (otherwise weekends might be oversharded and a peak during the week undersharded). Also note that we will very soon have Index Lifecycle Management (ILM) built into Elasticsearch and Kibana, which will make the management of rollover indices and deleting old data much simpler. Though it's under the (free) Basic license and not Apache2 — not sure if that is acceptable to be used in Zipkin then.

xeraa avatar Jan 29 '19 12:01 xeraa

@xeraa do you know which version rollover index was added? I agree the core issue here is size.

codefromthecrypt avatar Feb 18 '19 01:02 codefromthecrypt

@adriancole 6.6 (the current version): https://www.elastic.co/guide/en/elasticsearch/reference/6.6/index-lifecycle-management.html

You can fully managed it through the Elasticsearch API, but Kibana also provides a UI for it. And as I said: Not open source but free to use (Basic license).

xeraa avatar Feb 18 '19 19:02 xeraa

@libeilin before we experiment with a non-OSS feature, can you comment if rollover indexing is desirable? maintaining features has a cost, especially so with non OSS distributions (as it affects how we do testing) so we want to make sure there is user buy-in.

It is also possible for us to explore hourly indexes regardless

codefromthecrypt avatar Feb 27 '19 01:02 codefromthecrypt

email related to this thread on our dev list https://lists.apache.org/thread.html/73c2efa69e3ff0a519c6b6c2f5e551159c34902c29df01b2703e9126@%3Cdev.zipkin.apache.org%3E

codefromthecrypt avatar Apr 22 '19 00:04 codefromthecrypt

There's always Elastic Curator if you want to use Rollover, but are using OSS Elasticsearch (no Basic license). It's OSS, and requires no license.

untergeek avatar Apr 22 '19 01:04 untergeek

@untergeek thanks for the pointer. I think you are pointing to this specifically right? https://www.elastic.co/guide/en/elasticsearch/client/curator/5.6/ex_rollover.html

To elaborate this approach, we'd need some more details about what this will take in practice in terms of curator config vs index template config, any extra processes curator needs to run, what if anything the aliasing implies when we do reads or writes. I wonder if someone has this setup with a zipkin site already (or anything that uses daily indexes and rollover with no client call changes needed)

codefromthecrypt avatar Apr 22 '19 03:04 codefromthecrypt

We recently started using zipkin for opentracing. In our company also requirement is for monthly or weekly zipkin index. It would be great if you add this support.

singhabhinav03 avatar May 02 '19 21:05 singhabhinav03

Just as an idea: Maybe this is going a bit too deep down the rabbit hole for one datastore and it would make more sense to leave that part to Curator or ILM (by documenting the right configurations to be used)? There are various use cases about time based index patterns, rollover, deletion of data,... that are kind of solved externally already.

xeraa avatar May 02 '19 21:05 xeraa

Just as an idea: Maybe this is going a bit too deep down the rabbit hole for one datastore and it would make more sense to leave that part to Curator or ILM (by documenting the right configurations to be used)? There are various use cases about time based index patterns, rollover, deletion of data,... that are kind of solved externally already.

Yes, curator is how people handle this today, and many can't store months of trace data either :P We currently mention to use curator for index management, but possibly someone can come up with an example https://github.com/apache/incubator-zipkin/blob/8e4ada890c1b4f0f21babaf1a2315af128aeb4f4/zipkin-storage/elasticsearch/README.md#indexes

codefromthecrypt avatar May 03 '19 00:05 codefromthecrypt

In our company also requirement is for monthly or weekly zipkin index. It would be great if you add this support.

@singhabhinav03 could you elaborate on what you're trying to achieve that you cannot currently? The original request is to be able to have finer-grain indexes than daily because the data volume in one day is too large. Weekly or monthly indexes are only likely usable with relatively small amounts of tracing data.

shakuzen avatar May 08 '19 14:05 shakuzen

I think this issue got stuck as we were worried about how to address varied granularity. @narayaruna opened #2767 which doesn't imply varied granularity.

If we limit this to hourly indexes, still anyone can use curator or similar to rescale these to daily, weekly monthly.. correct? cc @openzipkin/elasticsearch

codefromthecrypt avatar Aug 21 '19 05:08 codefromthecrypt

If we limit this to hourly indexes, still anyone can use curator or similar to rescale these to daily, weekly monthly.. correct?

Not sure I'm reading this correctly, but combining hourly indices into a daily one (merging 24 indices) isn't easily possible — that would require a reindex (where you use a script to change the _index field).

My concern with hourly indices is that this will be a lot of shards. Just using 1 primary and 1 replica you'll end up with 48 shards for a single day. Our recommendation is to have less than 20 shards per GB of heap and each shard should be around 10 to 50GB in size. I can see how this works out for some heavy users, but it will be a bad choice for many others.

IMO a combination of rollover and write index alias would be the more generic solution that gives users fewer chances for bad configurations.

Do you have like a sample app where I could add the right config to show how this works? Might be easier than discussing it.

xeraa avatar Aug 21 '19 14:08 xeraa

@xeraa so I think the concern from @narayaruna is that with TB scale indexes, search, even with our cherry-picked indexing, require bumping read timeouts to 60s.. so more about query side than write side iiuc.

codefromthecrypt avatar Aug 21 '19 14:08 codefromthecrypt

so the thinking is.. I wonder.. if for data sets that naturally fit the heap-per-shard guidance at hourly or less, then putting that data in hourly should make more sense than daily. Query side could be better optimized with this as instead of requesting a day index for a search, it could an hourly, without any special features...

am I missing something? (ps thanks for mentioning where hourly does not make sense! possibly we can do a discover check to warn if config doesn't make sense)

codefromthecrypt avatar Aug 21 '19 15:08 codefromthecrypt

Yes, if you are looking at a short timeframe (like 1h). I'm not sure what the common access pattern is to be honest.

On the other hand if you have a filter on the timeframe and access it frequently enough then that will be cached and should also be pretty fast as well. I couldn't say how much win to expect (depends on so many factors including the access pattern — timeframe and frequency).

xeraa avatar Aug 21 '19 15:08 xeraa

Literally, the default lookback is 1 hour, and currently, it will grab a day or possibly 2 if just past midnight, to form a query with. This is probably why Nara mentions this, as it lowers the blast of default to max 2 hours if just past the hour.

Screenshot 2019-08-22 at 8 24 28 AM

codefromthecrypt avatar Aug 22 '19 00:08 codefromthecrypt

at any rate we could put a branch up and see how it goes. If isn't helpful we wouldn't do it, but for some sites this could be an easy to reason with, low-tech option to speed up some things.

Ack on the reindexing thing if someone needs to re-scale data. We can put more notes in the readme with knowledge gained here regardless of if the change is implemented.

codefromthecrypt avatar Aug 22 '19 00:08 codefromthecrypt

PS I opened this because I think I was the one who came up with the hour search default :) https://github.com/openzipkin/zipkin/issues/2772

codefromthecrypt avatar Aug 22 '19 00:08 codefromthecrypt

Sounds good on trying it out on a branch.

On the re-scaling: Rather than reindexing indices together, you could have an index template with 3 primary shards (just as an example for spreading the ingestion over 3 nodes), but once the index is readonly you could shrink it down to a single primary shard. That should be the better pattern for more parallelization at first and then reducing the number of shards later on. And this is just a question of index template and then Elastic Curator / ILM / ... — would probably just need a little documentation on the Zipkin side.

xeraa avatar Aug 22 '19 00:08 xeraa

We too are facing similar issue. Our daily indices are growing into trillions of spans in daily index resulting into slow queries. @libeilin @codefromthecrypt Were you guys able to figure out any workaround this? We are stuck here with our es queries getting timed out

nitishgoyal13 avatar Mar 26 '21 06:03 nitishgoyal13

Was anyone able to find a work around this? Are there any MRs which are able to support hourly indices? Any help on above would be really appreciated

nitishgoyal13 avatar May 02 '22 10:05 nitishgoyal13

If Zipkin can write to an alias (without any date math) then you could set that up with with ILM (https://www.elastic.co/guide/en/elasticsearch/reference/current/overview-index-lifecycle-management.html) in the background. That this was part of Zipkin is probably more for historic reasons when Elasticsearch lacked any such features, but things have luckily changed by now.

xeraa avatar May 03 '22 00:05 xeraa

+1

Delirante avatar May 09 '22 08:05 Delirante