rally icon indicating copy to clipboard operation
rally copied to clipboard

Adjust provisioning for removal of multiple data paths

Open dliappis opened this issue 4 years ago • 1 comments

What/Why

As multiple data paths have been removed https://github.com/elastic/elasticsearch/pull/72282 changes the data.path setting to only support a String (as opposed to a List or String).

In Rally and it's ecosystem we've been relying on a list even for single data paths. This means that currently (also factoring in how we use {{ data_paths }} in https://github.com/elastic/rally-teams/blob/cd4b8a134a41062f37704fa656908de8eff392f3/cars/v1/vanilla/templates/config/elasticsearch.yml#L33) we render path.data like a list e.g.

path.data: ['/path/to/elasticsearch/246ea52c-13d0-4120-bf04-678a058aac29/rally-node-0/install/elasticsearch-8.0.0-SNAPSHOT/data']

which ultimately results in a [ directory with Elasticsearch commits after https://github.com/elastic/elasticsearch/commit/d933ecd26cc443b1160e3fa7cab0fc4382893585 (or past 8.0.0) as the data dir.

This also results in index_size showing 0 in nightly charts e.g. for nightly-basic-geopoint-add-defaults-io in https://elasticsearch-benchmarks.elastic.co/#tracks/geopoint/nightly/default/30d.

Plan

  • [x] Short term patch to just use the first item in the list in https://github.com/elastic/rally-teams/blob/cd4b8a134a41062f37704fa656908de8eff392f3/cars/v1/vanilla/templates/config/elasticsearch.yml#L33 ; this will fix the value for index_size in nightly charts immediately Opened https://github.com/elastic/rally-teams/pull/65

  • [ ] Make the necessary changes in Rally to only pass a string for data_paths when Elasticsearch version is >=8.0.0 and revert the patch above.

    Implementation details:

    Most of the handling of data_paths is done in _data_paths. However, we don't have access to the version of Elasticsearch within this method, yet. We could consider passing distribution_version to the ElasticsearchInstaller, and then fail fast if user provided data_paths (as a car variable) have more than one item, or pass a string if there's only one item. It is important to note that when using the pipeline from-source distribution_version won't be set but when using this pipeline we should only strive for compatibility with the latest Elasticsearch master, therefore, we can safely pass only a string here.

dliappis avatar Apr 28 '21 14:04 dliappis

I've marked this issue as "blocked" as multiple data paths are still allowed in Elasticsearch 8.0 with https://github.com/elastic/elasticsearch/pull/78031 (the feature remains deprecated though).

danielmitterdorfer avatar Sep 28 '21 09:09 danielmitterdorfer