opensearch-benchmark [Feature Request] An automation tool to help identify the optimal bulk/batch size for ingestion

Is your feature request related to a problem? Please describe.

In this https://github.com/opensearch-project/OpenSearch/issues/12457, we proposed a batch ingestion feature which could accelerate the ingestion with neural search processors. It introduces an additional parameter "batch size" that texts from different documents could be combined and sent to ML server in one request. Since user could have different data set, different ML servers with different resources, in order to achieve better performance, they would need to experiment with different value of batch size to get the optimal performance. To offload the burden from user, we'd like to have a automation tool which could find this optimal batch size automatically.

Describe the solution you'd like

The automation tool would run _bulk API with different batch size to see which batch size would lead to optimal performance (high throughput & low latency & no errors). The OpenSearch-benchmark tool already provides rich features on benchmark which we could utilize for this automation. We can call benchmark with different parameters, collect and evaluate and even visualize the results then provide the recommendation.

The tool can also be used to help select bulk size and client number which we can support gradually.

Describe alternatives you've considered

N/A

Additional context

This issue was originally created in OS core repo but was suggested to repost in OSB. https://github.com/opensearch-project/OpenSearch/issues/13009

Apr 07 '24 02:04 chishui

I put the example code in this repo https://github.com/chishui/opensearch-ingest-param-tuning-tool, please evaluate to see if we can put the functionalities in the OSB

Apr 10 '24 05:04 chishui

Thank you for suggesting this @chishui. After our discussion offline and inspecting your changes, I have some concerns over this feature and its user experience.

OpenSearch performance is complex in nature and is influenced by several factors — such as cluster configurations, hardware, workload types, etc. OSB was designed to be a flexible tool that encourages users to engage in the experimentation process in order to develop a better understanding of their cluster’s performance and make strategic decisions based off the data collected.

Some of my concerns for this feature:

A subcommand like tuning that automatically recommends optimal parameters might not account for all factors that influence OpenSearch performance and can potentially mislead users. It could also obscure important details and overall reduce the user’s understanding of these underlying factors.
Users also might become overly reliant on this feature and assume that it always provides the best possible configuration without testing and evaluating results themselves. This conflicts with the core purpose of OSB, which is to serve as a flexible tool that encourages users to engage in the experimentation process to understand cluster performance.
The tuning subcommand only runs a single test for a mix of parameters and then moves onto the next test with a different mix of the same parameters. OpenSearch performance can fluctuate and one-time test results may not hold true over time. To combat this, we always recommend our users to run several tests with the same parameters to ensure they get repeatable results before deciding which are the best.
Users often customize their tests to meet their unique requirements. This feature does not have insight into the user’s unique requirements and is solely focused comparing configurations performances of one-time tests
Although the compare subcommand might be similar in terms of how it compares different test execution results and shows the percent differences, it is fairly a light-weight operation and leaves the interpretation up to the user.

Alternative solution

To combat some of these concerns, we can take an alternative approach and have studies performed in the open that show how how variables like indexing / search clients, bulk size, and batch size influence a cluster’s performance. These studies would serve as a learning resource that's publicly available and reproducible (similar to the nightly runs at opensearch.org/benchmarks). We can also add more documentation on performance testing and tuning variables. These resources would set users up for long-term success by equipping them with a better understanding of these variables and empowering them to use this knowledge, paired with OSB, to better assess their cluster's performance.

Again, although I see the use-case, I’m not certain that it fits within the design and purpose of OSB. The current implementation of the tuning subcommand is essentially a wrapper around OSB’s core action of running tests and would be more suitable as a separate tool. We can discuss this further if you’d like and see how we can incorporate this else where, perhaps in a performance tool suite. Also, tagging other maintainers -- @gkamat @rishabh6788 @cgchinmay @beaioun -- to see if they have any feedback on this that they'd like to add.

May 14 '24 17:05 IanHoang

@chishui Closing this issue for now as no activity. We can sync and collaborate to find a suitable solution for your needs.

May 24 '24 19:05 IanHoang

opensearch-benchmark opensearch-benchmark copied to clipboard

[Feature Request] An automation tool to help identify the optimal bulk/batch size for ingestion

Some of my concerns for this feature:

Alternative solution

opensearch-benchmark
opensearch-benchmark copied to clipboard