opensearch-benchmark
opensearch-benchmark copied to clipboard
[BUG] [Need-Help]Timeout error while creating a custom workload
We are trying to create a custom workload from a existing OpenSearch cluster using a below command;
opensearch-benchmark create-workload \
--workload="sample-index" \
--target-hosts="opensearch-master:9200" \
--client-options="basic_auth_user:'****',basic_auth_password:'****'" \
--indices="sample-index-5" \
--output-path="/opensearch-benchmark/.benchmark" --offline
This command is constantly ending in timeout error. Default timeout is mentioned as 10 seconds. We are not seeing an explicit flag to increase timeout from default 10s. Does anyone has any idea how to increase the timeout for create-workload command. Attached error screenshot.
Hi @VinodRakuten, we currently have an RFC for enhancing the create-workload
feature and mentions this issue and a proposed solution. We'll be implementing these enhancements soon. Feel free to have a look and comment on the RFC if you have any comments or questions.
To better help you:
- What's the command you're running (Please redact any information from the command as we only want to check the flags that are being used)?
- How many indices are you running against?
- How large are these indices in terms of GB?
- What's the average size of the shards?
Manual Work-around:
For very large indices, I've had successful experiences running the create-workload
feature on indices independently. This will essentially create separate workloads and users will have to combine the workload.jsons that are produced.
For example, if I want to extract corpora from indices A, B, and C, I'd run the create-workload
feature separately for each index. After doing this, I'll have three different workloads -- one for A, one for B, and one for C. Next, I'll create a new directory, move the test_procedures, operations, and corpora into this new directory. Lastly, I'd grab elements -- indices and corpora portion -- from all three workloads, combine them, and add them to a new workload.json in the new directory.
This isn't ideal but it gets the job done. Some of these issues should be resolved once we enhance this feature in the future based on this RFC plan.
Thank you for your reponse @IanHoang. We tried the manual workaround by merging the workload.json but it didn't work. Maybe we might missed some pieces while combining them. We'll give it another shot.
@VinDataR what were the errors that you encountered? Could you provide the exact steps you performed and the exception here? Feel free to redact any private information if sending screenshots or commands.
We were just getting this ReadTimeOut error while running the create-workload command in opensearch-benchmark.Below is the screenshot from the command.
We were able to create the workload successfully with the method you had suggested.
One more query that might be different to the topic of this issue;(Please let me know if it requires raising a separate issue);
We are trying to create a 500GB size custom workload by running create-workload for multiple indices of 40GB size each and combining them all into a single workload.json. The challenge is that each create-workload command takes around 6-8 hours to complete. Is there any way we can speed this up?
Below is the command we are running;(replaced possible sensitive information with **** symbol )
nohup opensearch-benchmark create-workload \
--workload="*******" \
--target-hosts="*******" \
--client-options="basic_auth_user:'******',basic_auth_password:'******'" \
--indices="********" \
--output-path="/opensearch-benchmark/.benchmark/benchmarks" --offline &
We plan to work on a faster workaround for extracting corpora and give create-workload
"restart" options, to pick up from where the timeout occurred. These can be found in the RFC I linked above.
@VinDataR How many GB did you have by the time OSB hit a read-timeout after extracting 67% of the target index (from the screenshot you sent)?