rally Remove references to segments memory usage

Elasticsearch is about to stop reporting memory usage of segments via https://github.com/elastic/elasticsearch/pull/75274, including per-segment data structures like terms, points or doc values memory usage.

Rally currently uses these stats in a few places e.g. esrally/telemetry.py and esrally/metrics.py, let's stop collecting these stats?

Please let me know if I should wait for this to be addressed in Rally before merging the Elasticsearch change.

Jul 20 '21 14:07 jpountz

For the record, I tested how Rally behaved with these stats removed and it looks like it handled it gracefully. The output was a bit smaller than usual as Rally had dropped memory usage from it:

$ ./rally race --preserve-install --track=solutions/logs --track-repository=internal --track-params="number_of_replicas:0,raw_data_volume_per_day:5GB,wait_for_status:yellow" --track-revision=778a20b97ee5895e1c4717188f2208e43d7ccf52 --car=4gheap
Auto-updating Rally from origin
Fast-forwarded master to origin/master.

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Preparing for race ...
[INFO] Racing on track [solutions/logs], challenge [logging-indexing] and car ['4gheap'] with version [8.0.0-SNAPSHOT].

Running insert-pipelines                                                       [100% done]
Running insert-ilm                                                             [100% done]
Running delete-all-datastreams                                                 [100% done]
Running delete-all-index-templates                                             [100% done]
Running create-all-index-templates                                             [100% done]
Running create-required-data-streams                                           [100% done]
Running wait-for-datastreams                                                   [100% done]
Running bulk-index                                                             [100% done]
Running compression-stats                                                      [100% done]

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                         Metric |                         Task |       Value |   Unit |
|---------------------------------------------------------------:|-----------------------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |                              |     40.3858 |    min |
|             Min cumulative indexing time across primary shards |                              |   0.0172333 |    min |
|          Median cumulative indexing time across primary shards |                              |     0.91105 |    min |
|             Max cumulative indexing time across primary shards |                              |       23.05 |    min |
|            Cumulative indexing throttle time of primary shards |                              |           0 |    min |
|    Min cumulative indexing throttle time across primary shards |                              |           0 |    min |
| Median cumulative indexing throttle time across primary shards |                              |           0 |    min |
|    Max cumulative indexing throttle time across primary shards |                              |           0 |    min |
|                        Cumulative merge time of primary shards |                              |     13.8162 |    min |
|                       Cumulative merge count of primary shards |                              |         128 |        |
|                Min cumulative merge time across primary shards |                              |           0 |    min |
|             Median cumulative merge time across primary shards |                              |     0.06565 |    min |
|                Max cumulative merge time across primary shards |                              |     12.0313 |    min |
|               Cumulative merge throttle time of primary shards |                              |     6.97037 |    min |
|       Min cumulative merge throttle time across primary shards |                              |           0 |    min |
|    Median cumulative merge throttle time across primary shards |                              |           0 |    min |
|       Max cumulative merge throttle time across primary shards |                              |     6.45255 |    min |
|                      Cumulative refresh time of primary shards |                              |     1.26883 |    min |
|                     Cumulative refresh count of primary shards |                              |         226 |        |
|              Min cumulative refresh time across primary shards |                              |  0.00173333 |    min |
|           Median cumulative refresh time across primary shards |                              |   0.0445833 |    min |
|              Max cumulative refresh time across primary shards |                              |      0.6646 |    min |
|                        Cumulative flush time of primary shards |                              |     1.42982 |    min |
|                       Cumulative flush count of primary shards |                              |          84 |        |
|                Min cumulative flush time across primary shards |                              |     0.00355 |    min |
|             Median cumulative flush time across primary shards |                              |   0.0271833 |    min |
|                Max cumulative flush time across primary shards |                              |    0.898817 |    min |
|                                        Total Young Gen GC time |                              |       11.83 |      s |
|                                       Total Young Gen GC count |                              |        1161 |        |
|                                          Total Old Gen GC time |                              |           0 |      s |
|                                         Total Old Gen GC count |                              |           0 |        |
|                                                     Store size |                              |     4.97561 |     GB |
|                                                  Translog size |                              | 6.14673e-07 |     GB |
|                                                  Segment count |                              |         138 |        |
|                                                 Min Throughput |             insert-pipelines |       14.11 |  ops/s |
|                                                Mean Throughput |             insert-pipelines |       14.11 |  ops/s |
|                                              Median Throughput |             insert-pipelines |       14.11 |  ops/s |
|                                                 Max Throughput |             insert-pipelines |       14.11 |  ops/s |
|                                       100th percentile latency |             insert-pipelines |     990.548 |     ms |
|                                  100th percentile service time |             insert-pipelines |     990.548 |     ms |
|                                                     error rate |             insert-pipelines |           0 |      % |
|                                                 Min Throughput |                   insert-ilm |       27.04 |  ops/s |
|                                                Mean Throughput |                   insert-ilm |       27.04 |  ops/s |
|                                              Median Throughput |                   insert-ilm |       27.04 |  ops/s |
|                                                 Max Throughput |                   insert-ilm |       27.04 |  ops/s |
|                                       100th percentile latency |                   insert-ilm |     36.5293 |     ms |
|                                  100th percentile service time |                   insert-ilm |     36.5293 |     ms |
|                                                     error rate |                   insert-ilm |           0 |      % |
|                                                 Min Throughput |   delete-all-index-templates |      466.04 |  ops/s |
|                                                Mean Throughput |   delete-all-index-templates |      466.04 |  ops/s |
|                                              Median Throughput |   delete-all-index-templates |      466.04 |  ops/s |
|                                                 Max Throughput |   delete-all-index-templates |      466.04 |  ops/s |
|                                       100th percentile latency |   delete-all-index-templates |     31.9989 |     ms |
|                                  100th percentile service time |   delete-all-index-templates |     31.9989 |     ms |
|                                                     error rate |   delete-all-index-templates |           0 |      % |
|                                                 Min Throughput |   create-all-index-templates |       26.11 |  ops/s |
|                                                Mean Throughput |   create-all-index-templates |       26.11 |  ops/s |
|                                              Median Throughput |   create-all-index-templates |       26.11 |  ops/s |
|                                                 Max Throughput |   create-all-index-templates |       26.11 |  ops/s |
|                                       100th percentile latency |   create-all-index-templates |      574.15 |     ms |
|                                  100th percentile service time |   create-all-index-templates |      574.15 |     ms |
|                                                     error rate |   create-all-index-templates |           0 |      % |
|                                                 Min Throughput | create-required-data-streams |        5.41 |  ops/s |
|                                                Mean Throughput | create-required-data-streams |        5.46 |  ops/s |
|                                              Median Throughput | create-required-data-streams |        5.46 |  ops/s |
|                                                 Max Throughput | create-required-data-streams |        5.51 |  ops/s |
|                                        50th percentile latency | create-required-data-streams |     187.788 |     ms |
|                                        90th percentile latency | create-required-data-streams |     190.911 |     ms |
|                                       100th percentile latency | create-required-data-streams |     191.227 |     ms |
|                                   50th percentile service time | create-required-data-streams |     187.788 |     ms |
|                                   90th percentile service time | create-required-data-streams |     190.911 |     ms |
|                                  100th percentile service time | create-required-data-streams |     191.227 |     ms |
|                                                     error rate | create-required-data-streams |           0 |      % |
|                                                 Min Throughput |         wait-for-datastreams |      639.47 |  ops/s |
|                                                Mean Throughput |         wait-for-datastreams |      639.47 |  ops/s |
|                                              Median Throughput |         wait-for-datastreams |      639.47 |  ops/s |
|                                                 Max Throughput |         wait-for-datastreams |      639.47 |  ops/s |
|                                        50th percentile latency |         wait-for-datastreams |    0.742331 |     ms |
|                                        90th percentile latency |         wait-for-datastreams |    0.811442 |     ms |
|                                       100th percentile latency |         wait-for-datastreams |     4.52598 |     ms |
|                                   50th percentile service time |         wait-for-datastreams |    0.742331 |     ms |
|                                   90th percentile service time |         wait-for-datastreams |    0.811442 |     ms |
|                                  100th percentile service time |         wait-for-datastreams |     4.52598 |     ms |
|                                                     error rate |         wait-for-datastreams |           0 |      % |
|                                                 Min Throughput |                   bulk-index |      884.83 | docs/s |
|                                                Mean Throughput |                   bulk-index |     32233.9 | docs/s |
|                                              Median Throughput |                   bulk-index |     33330.4 | docs/s |
|                                                 Max Throughput |                   bulk-index |     33871.5 | docs/s |
|                                        50th percentile latency |                   bulk-index |     161.223 |     ms |
|                                        90th percentile latency |                   bulk-index |     406.313 |     ms |
|                                        99th percentile latency |                   bulk-index |     572.411 |     ms |
|                                      99.9th percentile latency |                   bulk-index |     1131.62 |     ms |
|                                     99.99th percentile latency |                   bulk-index |     1651.57 |     ms |
|                                       100th percentile latency |                   bulk-index |     1787.46 |     ms |
|                                   50th percentile service time |                   bulk-index |     161.223 |     ms |
|                                   90th percentile service time |                   bulk-index |     406.313 |     ms |
|                                   99th percentile service time |                   bulk-index |     572.411 |     ms |
|                                 99.9th percentile service time |                   bulk-index |     1131.62 |     ms |
|                                99.99th percentile service time |                   bulk-index |     1651.57 |     ms |
|                                  100th percentile service time |                   bulk-index |     1787.46 |     ms |
|                                                     error rate |                   bulk-index |           0 |      % |
|                                                 Min Throughput |            compression-stats |        0.21 |  ops/s |
|                                                Mean Throughput |            compression-stats |        0.36 |  ops/s |
|                                              Median Throughput |            compression-stats |        0.26 |  ops/s |
|                                                 Max Throughput |            compression-stats |        0.84 |  ops/s |
|                                        50th percentile latency |            compression-stats |     18453.7 |     ms |
|                                        90th percentile latency |            compression-stats |     70295.3 |     ms |
|                                       100th percentile latency |            compression-stats |      104852 |     ms |
|                                   50th percentile service time |            compression-stats |     18453.7 |     ms |
|                                   90th percentile service time |            compression-stats |     70295.3 |     ms |
|                                  100th percentile service time |            compression-stats |      104852 |     ms |
|                                                     error rate |            compression-stats |        9.09 |      % |

[WARNING] Error rate is 9.09 for operation 'compression-stats'. Please check the logs.
[INFO] Preserving benchmark candidate installation at [/home/jpountz/.rally/benchmarks/races/62dd094c-22f4-420f-ab96-c1679c3b2b2e/rally-node-0/install/elasticsearch-8.0.0-SNAPSHOT].

----------------------------------
[INFO] SUCCESS (took 1234 seconds)
----------------------------------

And the log contained the following lines:

2021-07-20 14:18:51,358 ActorAddr-(T|:39093)/PID:140875 esrally.telemetry WARNING Could not determine value at path [segments,memory_in_bytes]. Returning default value [None]
2021-07-20 14:18:51,358 ActorAddr-(T|:39093)/PID:140875 esrally.telemetry WARNING Could not determine value at path [segments,doc_values_memory_in_bytes]. Returning default value [None]
2021-07-20 14:18:51,358 ActorAddr-(T|:39093)/PID:140875 esrally.telemetry WARNING Could not determine value at path [segments,stored_fields_memory_in_bytes]. Returning default value [None]
2021-07-20 14:18:51,358 ActorAddr-(T|:39093)/PID:140875 esrally.telemetry WARNING Could not determine value at path [segments,terms_memory_in_bytes]. Returning default value [None]
2021-07-20 14:18:51,358 ActorAddr-(T|:39093)/PID:140875 esrally.telemetry WARNING Could not determine value at path [segments,norms_memory_in_bytes]. Returning default value [None]
2021-07-20 14:18:51,358 ActorAddr-(T|:39093)/PID:140875 esrally.telemetry WARNING Could not determine value at path [segments,points_memory_in_bytes]. Returning default value [None]

Jul 20 '21 15:07 jpountz

Thanks for the issue! Given that Rally handles this gracefully already there is no immediate urgency to remove it. This also means that https://github.com/elastic/elasticsearch/pull/75274 can be merged at any time. :)

Jul 20 '21 15:07 danielmitterdorfer

rally rally copied to clipboard

Remove references to segments memory usage

rally
rally copied to clipboard