opensearch-build icon indicating copy to clipboard operation
opensearch-build copied to clipboard

[Bug]: inefficient OS ENVs processing delays opensearch node start time up to 10-15minutes

Open lde-avaleo opened this issue 1 year ago • 5 comments

Describe the bug

It seems that in large kubernetes cluster environment where there is a lot of ENVs passed to the pod it takes ages to process all of them. This result in very long time needed by opensearch node to start. e.g. In our environment there are almost 5000 ENVs passed to the pod. This result in 15minutes long opensearch node start time.

To reproduce

Create more or less 5000 OS ENVs and try to start opensearch node.

Expected behavior

Improved performance of ENVs processing resulting in shorten opensearch startup time.

Screenshots

No response

Host / Environment

Kubernetes Cluster

Additional context

I think that main problem lays in opensearch-docker-entrypoint.sh script in while IFS='=' read -r envvar_key envvar_value which is very slow and ineeficient.

Maybe you could consider to change IFS for whole while loop like this below (of course then you would accept some caveat that this change would introduce):

old_ifs="$IFS"
IFS='='
while read -r envvar_key envvar_value
[...]
IFS="$old_ifs"

Quick tests shows that above change shorten time needed to process ENVs from minutes to seconds.

Relevant log output

No response

lde-avaleo avatar Oct 04 '23 08:10 lde-avaleo

Thanks, @lde-avaleo please feel free to contribute if you have a good solution for the issue. @peterzhuamazon can you take look at the issue, thanks.

Divyaasm avatar Oct 10 '23 19:10 Divyaasm

Thanks for brining this up, @lde-avaleo , we always appreciate all inputs that makes OpenSearch a better product for the community.

As you pointed out, instead of reading line by line and processing each variable individually, we could improve the efficiency by reading all ENV at once, and filtering them out ...

You mentioned that you conducted a quick test and the performance improvement was great, would you be willing to create a pull request to propose these changes? Perhaps something like these lines?

opensearch_opts=()
env_vars=$(env)
while IFS= read -r env_line; do
    IFS='=' read -r envvar_key envvar_value <<< "$env_line"
    if [[ "$envvar_key" =~ ^[a-z0-9_]+\.[a-z0-9_]+ || "$envvar_key" == "processors" ]]; then
        if [[ ! -z $envvar_value ]]; then
            openseearch_opt="-E${envvar_key}=${envvar_value}"
            opensearch_opts+=("${opensearch_opt}")
        fi
    fi
done <<< "$env_vars"

CC : @peterzhuamazon @prudhvigodithi @gaiksaya

jordarlu avatar Oct 17 '23 18:10 jordarlu

Hey @lde-avaleo, can you please share some details on how to re-produce this error? I assume 10.-15 mins time what you mentioned does not include the image pull by the POD. Looking for exact time when pod phase changes to Running and in 1/1 state, where the OSD is accessible.

prudhvigodithi avatar Nov 27 '23 19:11 prudhvigodithi

We should try to remove the code here https://github.com/opensearch-project/opensearch-build/blob/main/docker/release/config/opensearch-dashboards/opensearch-dashboards-docker-entrypoint-2.x.sh#L21-L169 and test the behaviors to load the environment values. Adding @peterzhuamazon.

prudhvigodithi avatar Nov 27 '23 19:11 prudhvigodithi

Hi, @peterzhuamazon , would you agree with @prudhvigodithi on the above comment? Do we need to keep only those experimental components?

jordarlu avatar Jan 23 '24 23:01 jordarlu