Allow an option to build a custom workload from hidden/datastream indices
Is your feature request related to a problem? Please describe.
In our opensearch cluster we use datastreams to handle index rotation when they hit certain thresholds. We'd like to create some custom benchmarking workloads based off our indices, but we can't as the indices for datastreams are "hidden" and not picked up by the create-workload command - they begin with .ds-.
This seems to be the offending line in the code: https://github.com/opensearch-project/opensearch-benchmark/blob/main/osbenchmark/workload_generator/index.py#L63
Describe the solution you'd like
We would like an option on the create-workload command to be able to include hidden indices (or even just "datastream" indices) within a custom workload.
Hello, @rishabh6788 @IanHoang @gkamat, would you please have a look and give your comments? thanks !!
I think the check is in there to make sure user doesn't include system/security indices by mistake while generating workloads using OSB.
A quick hack I can think of is that you checkout the opensearch-benchmark repo, remove the condition in the code you mentioned and then do a local install using pip3 install -e ..
You should be able to bypass this check.
What @rishabh6788 mentioned above is a good and quick workaround. We can look at adding this as a flag, such as --include-hidden-indices in the create-workload feature if that helps. @dazoakley If you'd like, you could make a quick fix for this in the code-base and submit a PR for this option?
Hi folks, thanks for the prompt replies. 😄
Yep, I've been using that workaround suggested, but yep, having it as an actual cli option would be much more convenient. If you're ok for me to submit a PR with that change I'd be happy to - I'll see if I can get it done later today.
In fact, @IanHoang would something like --include-datastream-indices be a better flag? Then you'd still stop people being able to pull in system/security indices, but allow the use of datastreams?
@dazoakley Apologies for the late response! Either --include-datastream-indices would work. We would just have to include a check that none of the "datastream" indices collected are security / system indices as you mentioned.
Once again, if you cut a quick fix for this, we can quickly address the PR. Thank you for your patience.