opensearch-build
opensearch-build copied to clipboard
[Bug]: inefficient OS ENVs processing delays opensearch node start time up to 10-15minutes
Describe the bug
It seems that in large kubernetes cluster environment where there is a lot of ENVs passed to the pod it takes ages to process all of them. This result in very long time needed by opensearch node to start. e.g. In our environment there are almost 5000 ENVs passed to the pod. This result in 15minutes long opensearch node start time.
To reproduce
Create more or less 5000 OS ENVs and try to start opensearch node.
Expected behavior
Improved performance of ENVs processing resulting in shorten opensearch startup time.
Screenshots
No response
Host / Environment
Kubernetes Cluster
Additional context
I think that main problem lays in opensearch-docker-entrypoint.sh
script in while IFS='=' read -r envvar_key envvar_value
which is very slow and ineeficient.
Maybe you could consider to change IFS for whole while loop like this below (of course then you would accept some caveat that this change would introduce):
old_ifs="$IFS"
IFS='='
while read -r envvar_key envvar_value
[...]
IFS="$old_ifs"
Quick tests shows that above change shorten time needed to process ENVs from minutes to seconds.
Relevant log output
No response
Thanks, @lde-avaleo please feel free to contribute if you have a good solution for the issue. @peterzhuamazon can you take look at the issue, thanks.
Thanks for brining this up, @lde-avaleo , we always appreciate all inputs that makes OpenSearch a better product for the community.
As you pointed out, instead of reading line by line and processing each variable individually, we could improve the efficiency by reading all ENV at once, and filtering them out ...
You mentioned that you conducted a quick test and the performance improvement was great, would you be willing to create a pull request to propose these changes? Perhaps something like these lines?
opensearch_opts=()
env_vars=$(env)
while IFS= read -r env_line; do
IFS='=' read -r envvar_key envvar_value <<< "$env_line"
if [[ "$envvar_key" =~ ^[a-z0-9_]+\.[a-z0-9_]+ || "$envvar_key" == "processors" ]]; then
if [[ ! -z $envvar_value ]]; then
openseearch_opt="-E${envvar_key}=${envvar_value}"
opensearch_opts+=("${opensearch_opt}")
fi
fi
done <<< "$env_vars"
CC : @peterzhuamazon @prudhvigodithi @gaiksaya
Hey @lde-avaleo, can you please share some details on how to re-produce this error? I assume 10.-15 mins time what you mentioned does not include the image pull by the POD. Looking for exact time when pod phase changes to Running
and in 1/1
state, where the OSD is accessible.
We should try to remove the code here https://github.com/opensearch-project/opensearch-build/blob/main/docker/release/config/opensearch-dashboards/opensearch-dashboards-docker-entrypoint-2.x.sh#L21-L169 and test the behaviors to load the environment values. Adding @peterzhuamazon.
Hi, @peterzhuamazon , would you agree with @prudhvigodithi on the above comment? Do we need to keep only those experimental components?