opensearch-benchmark icon indicating copy to clipboard operation
opensearch-benchmark copied to clipboard

Allow an option to build a custom workload from hidden/datastream indices

Open dazoakley opened this issue 2 years ago • 5 comments

Is your feature request related to a problem? Please describe.

In our opensearch cluster we use datastreams to handle index rotation when they hit certain thresholds. We'd like to create some custom benchmarking workloads based off our indices, but we can't as the indices for datastreams are "hidden" and not picked up by the create-workload command - they begin with .ds-.

This seems to be the offending line in the code: https://github.com/opensearch-project/opensearch-benchmark/blob/main/osbenchmark/workload_generator/index.py#L63

Describe the solution you'd like

We would like an option on the create-workload command to be able to include hidden indices (or even just "datastream" indices) within a custom workload.

dazoakley avatar Dec 05 '23 16:12 dazoakley

Hello, @rishabh6788 @IanHoang @gkamat, would you please have a look and give your comments? thanks !!

jordarlu avatar Dec 05 '23 20:12 jordarlu

I think the check is in there to make sure user doesn't include system/security indices by mistake while generating workloads using OSB. A quick hack I can think of is that you checkout the opensearch-benchmark repo, remove the condition in the code you mentioned and then do a local install using pip3 install -e .. You should be able to bypass this check.

rishabh6788 avatar Dec 05 '23 20:12 rishabh6788

What @rishabh6788 mentioned above is a good and quick workaround. We can look at adding this as a flag, such as --include-hidden-indices in the create-workload feature if that helps. @dazoakley If you'd like, you could make a quick fix for this in the code-base and submit a PR for this option?

IanHoang avatar Dec 05 '23 20:12 IanHoang

Hi folks, thanks for the prompt replies. 😄

Yep, I've been using that workaround suggested, but yep, having it as an actual cli option would be much more convenient. If you're ok for me to submit a PR with that change I'd be happy to - I'll see if I can get it done later today.

In fact, @IanHoang would something like --include-datastream-indices be a better flag? Then you'd still stop people being able to pull in system/security indices, but allow the use of datastreams?

dazoakley avatar Dec 06 '23 08:12 dazoakley

@dazoakley Apologies for the late response! Either --include-datastream-indices would work. We would just have to include a check that none of the "datastream" indices collected are security / system indices as you mentioned.

Once again, if you cut a quick fix for this, we can quickly address the PR. Thank you for your patience.

IanHoang avatar Mar 21 '24 17:03 IanHoang