rally
rally copied to clipboard
Adjust provisioning for removal of multiple data paths
What/Why
As multiple data paths have been removed https://github.com/elastic/elasticsearch/pull/72282 changes the data.path setting to only support a String (as opposed to a List or String).
In Rally and it's ecosystem we've been relying on a list even for single data paths. This means that currently (also factoring in how we use {{ data_paths }} in https://github.com/elastic/rally-teams/blob/cd4b8a134a41062f37704fa656908de8eff392f3/cars/v1/vanilla/templates/config/elasticsearch.yml#L33) we render path.data like a list e.g.
path.data: ['/path/to/elasticsearch/246ea52c-13d0-4120-bf04-678a058aac29/rally-node-0/install/elasticsearch-8.0.0-SNAPSHOT/data']
which ultimately results in a [ directory with Elasticsearch commits after https://github.com/elastic/elasticsearch/commit/d933ecd26cc443b1160e3fa7cab0fc4382893585 (or past 8.0.0) as the data dir.
This also results in index_size showing 0 in nightly charts e.g. for nightly-basic-geopoint-add-defaults-io in https://elasticsearch-benchmarks.elastic.co/#tracks/geopoint/nightly/default/30d.
Plan
-
[x] Short term patch to just use the first item in the list in https://github.com/elastic/rally-teams/blob/cd4b8a134a41062f37704fa656908de8eff392f3/cars/v1/vanilla/templates/config/elasticsearch.yml#L33 ; this will fix the value for
index_sizein nightly charts immediately Opened https://github.com/elastic/rally-teams/pull/65 -
[ ] Make the necessary changes in Rally to only pass a string for
data_pathswhen Elasticsearch version is >=8.0.0 and revert the patch above.Implementation details:
Most of the handling of
data_pathsis done in _data_paths. However, we don't have access to the version of Elasticsearch within this method, yet. We could consider passing distribution_version to the ElasticsearchInstaller, and then fail fast if user provideddata_paths(as a car variable) have more than one item, or pass a string if there's only one item. It is important to note that when using the pipelinefrom-sourcedistribution_version won't be set but when using this pipeline we should only strive for compatibility with the latest Elasticsearch master, therefore, we can safely pass only a string here.
I've marked this issue as "blocked" as multiple data paths are still allowed in Elasticsearch 8.0 with https://github.com/elastic/elasticsearch/pull/78031 (the feature remains deprecated though).