microshift USHIFT-4378: introduce ingress customization fields

Which issue(s) this PR addresses:

Closes #<Issue Number>

Sep 30 '24 15:09 eslutsky

@eslutsky: This pull request references USHIFT-4378 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set.

In response to this:

Which issue(s) this PR addresses:

Closes #<Issue Number>

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sep 30 '24 15:09 openshift-ci-robot

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

Sep 30 '24 15:09 openshift-ci[bot]

/test all

Oct 03 '24 08:10 eslutsky

/retest

Oct 24 '24 14:10 pmtk

/retest

Oct 24 '24 17:10 jerpeter1

/retest

Oct 25 '24 08:10 pmtk

/test e2e-aws-tests-arm

Oct 28 '24 12:10 eslutsky

the default maxConnections should be 50000, not 5000, not sure it is expected. Others look good to me, thanks.

1.
% oc -n openshift-ingress rsh router-default-64785749b4-zz84t
sh-5.1$ env | grep -i conn
ROUTER_MAX_CONNECTIONS=5000
sh-5.1$
sh-5.1$  cat haproxy.config | grep -i maxconn
  maxconn 5000
  maxconn 5000
sh-5.1$

2. in config.yaml.default, the default of maxConnections is 50000
        # maxConnections defines the maximum number of simultaneous
        # connections that can be established per HAProxy process.
        # Increasing this value allows each ingress controller pod to
        # handle more connections but at the cost of additional
        # system resources being consumed.


        # Permitted values are: empty, 0, -1, and the range 
        # 2000-2000000.


        # If this field is empty or 0, the IngressController will use
        # the default value of 50000, but the default is subject to
        # change in future releases.

Nov 01 '24 11:11 ShudiLi

Configure the custom parameters with 2 times of the default values, restart the microshift service(sudo systemctl restart microshift), then check the router-default deployment, haproxy.conf and env under a router pod. All are expected except connectTimeout and reloadInterval

1.
sh-5.1# cat /etc/microshift/config.yaml | grep connectTimeout:
        connectTimeout: 10s
sh-5.1#
sh-5.1$ grep "timeout connect" haproxy.config
  timeout connect 5s
sh-5.1$
sh-5.1$ env | grep CONN
ROUTER_MAX_CONNECTIONS=100000
sh-5.1$

2.
sh-5.1# cat /etc/microshift/config.yaml | grep reloadInterval:
        reloadInterval: 10s
sh-5.1#
% oc -n openshift-ingress get deployment router-default -oyaml | grep -A1 RELOAD_INTERVAL
        - name: RELOAD_INTERVAL
          value: 5s

sh-5.1$ env | grep RELOAD_INTERVAL
RELOAD_INTERVAL=5s
sh-5.1$

Nov 04 '24 11:11 ShudiLi

@ShudiLi , connectTimeout and reloadInterval wasnt meant to be exposed as ingress configuration option, we inheritted them because we used IngressControllerTuningOptions from the ingress operator API @pacevedom , it looks like we have to duplicate that struct into our code base.

Nov 04 '24 16:11 eslutsky

@eslutsky Thanks for the explanation.

Nov 05 '24 14:11 ShudiLi

/test e2e-aws-tests-bootc

Nov 05 '24 14:11 eslutsky

@eslutsky some timeout parameters weren't updated in the haproxy.config file, thanks.

1.
% oc -n openshift-ingress get deployment -oyaml | grep -A1 TIMEOUT
          - name: ROUTER_DEFAULT_CLIENT_TIMEOUT
            value: 2m0s
          - name: ROUTER_CLIENT_FIN_TIMEOUT
            value: 1m0s
          - name: ROUTER_DEFAULT_SERVER_TIMEOUT
            value: 2m0s
          - name: ROUTER_DEFAULT_SERVER_FIN_TIMEOUT
            value: 2m0s
          - name: ROUTER_DEFAULT_TUNNEL_TIMEOUT
            value: 2h0m0s

2. in the router pod
sh-5.1$ env | grep -i timeout
ROUTER_DEFAULT_TUNNEL_TIMEOUT=2h0m0s
ROUTER_DEFAULT_SERVER_TIMEOUT=2m0s
ROUTER_DEFAULT_CLIENT_TIMEOUT=2m0s
ROUTER_DEFAULT_SERVER_FIN_TIMEOUT=2m0s
ROUTER_CLIENT_FIN_TIMEOUT=1m0s
sh-5.1$

3. in the router pod, check the timeout parameters, "timeout tunnel", "timeout server", "timeout client", "timeout server-fin", and "timeout client-fin" were the default values, weren't updated.
sh-5.1$ cat haproxy.config | grep -i "timeout "
  stats timeout 2m
  timeout connect 5s
  timeout client 30s
  timeout client-fin 1s
  timeout server 30s
  timeout server-fin 1s
  timeout http-request 10s
  timeout http-keep-alive 300s
  # Long timeout for WebSocket connections.
  timeout tunnel 1h
sh-5.1$

Nov 06 '24 06:11 ShudiLi

good catch ! looks like timeSpecPattern is not accepting all the metav1.Duration variables like 2m0s,so it has to be a single value ie: 2h0m0s should be just 2h , we will have to use some other validation type. not sure how openshift accept those..

Nov 06 '24 09:11 eslutsky

good catch ! looks like timeSpecPattern is not accepting all the metav1.Duration variables like 2m0s,so it has to be a single value ie: 2h0m0s should be just 2h , we will have to use some other validation type. not sure how openshift accept those..

i've checked again against the operator code, and we were missing the durationToHAProxyTimespec function

Nov 06 '24 21:11 eslutsky

/test verify

Nov 07 '24 10:11 eslutsky

@ShudiLi , the issue that you found has been addressed.

Nov 07 '24 10:11 eslutsky

/retest

Nov 08 '24 14:11 eslutsky

@eslutsky Thanks for the fixed, I did test today and found httpEmptyRequestsPolicy and logEmptyRequests couldn't be seen in the haproxy.config file. Also for defaultHTTPVersion, there was just default value, there weren't valid values. I tried to update it with 2, the cluster was disconnected.

1. For httpEmptyRequestsPolicy: Ignore, "  option http-ignore-probes" should be found in the haproxy.config
sh-5.1$ env | grep -w ROUTER_HTTP_IGNORE_PROBES
ROUTER_HTTP_IGNORE_PROBES=Ignore
sh-5.1$ cat haproxy.config | grep -i "http-ignore-probes"
sh-5.1$


2. For logEmptyRequests: Ignore, "  option dontlognul" should be found in the haproxy.config
sh-5.1$ env | grep ROUTER_DONT_LOG_NULL
ROUTER_DONT_LOG_NULL=Ignore
sh-5.1$ 
sh-5.1$ cat haproxy.config | grep -i dontlognull
sh-5.1$

3. are the valid values available for the defaultHTTPVersion?
ingress:
    # Determines default http version should be used for the ingress backends
    # By default,  using version 1.
    defaultHTTPVersion: 1

Nov 11 '24 06:11 ShudiLi

ROUTER_HTTP_IGNORE_PROBES thanks @ShudiLi it appears that haproxy.config expects "true" in ROUTER_HTTP_IGNORE_PROBES and ROUTER_DONT_LOG_NULL ENV, fixed.

Nov 11 '24 11:11 eslutsky

/test ocp-full-conformance-serial-optional-components-rhel-eus-arm

Nov 12 '24 11:11 eslutsky

this looks good to me

Nov 13 '24 11:11 pmtk

/retest

Nov 15 '24 10:11 eslutsky

/test ?

Nov 15 '24 13:11 eslutsky

@eslutsky: The following commands are available to trigger required jobs:

/test e2e-aws-footprint-and-performance
/test e2e-aws-tests
/test e2e-aws-tests-arm
/test e2e-aws-tests-bootc
/test e2e-aws-tests-bootc-arm
/test e2e-aws-tests-bootc-periodic
/test e2e-aws-tests-bootc-periodic-arm
/test e2e-aws-tests-cache
/test e2e-aws-tests-cache-arm
/test e2e-aws-tests-periodic
/test e2e-aws-tests-periodic-arm
/test images
/test ocp-full-conformance-optional-components-rhel-eus
/test ocp-full-conformance-optional-components-rhel-eus-arm
/test ocp-full-conformance-rhel-eus
/test ocp-full-conformance-rhel-eus-arm
/test ocp-full-conformance-serial-optional-components-rhel-eus
/test ocp-full-conformance-serial-optional-components-rhel-eus-arm
/test ocp-full-conformance-serial-rhel-eus
/test ocp-full-conformance-serial-rhel-eus-arm
/test test-rpm
/test test-unit
/test verify

The following commands are available to trigger optional jobs:

/test security
/test test-rebase

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-microshift-main-e2e-aws-tests
pull-ci-openshift-microshift-main-e2e-aws-tests-arm
pull-ci-openshift-microshift-main-e2e-aws-tests-bootc
pull-ci-openshift-microshift-main-e2e-aws-tests-bootc-arm
pull-ci-openshift-microshift-main-images
pull-ci-openshift-microshift-main-ocp-full-conformance-optional-components-rhel-eus
pull-ci-openshift-microshift-main-ocp-full-conformance-optional-components-rhel-eus-arm
pull-ci-openshift-microshift-main-ocp-full-conformance-rhel-eus
pull-ci-openshift-microshift-main-ocp-full-conformance-rhel-eus-arm
pull-ci-openshift-microshift-main-ocp-full-conformance-serial-optional-components-rhel-eus
pull-ci-openshift-microshift-main-ocp-full-conformance-serial-optional-components-rhel-eus-arm
pull-ci-openshift-microshift-main-ocp-full-conformance-serial-rhel-eus
pull-ci-openshift-microshift-main-ocp-full-conformance-serial-rhel-eus-arm
pull-ci-openshift-microshift-main-security
pull-ci-openshift-microshift-main-test-rebase
pull-ci-openshift-microshift-main-test-rpm
pull-ci-openshift-microshift-main-test-unit
pull-ci-openshift-microshift-main-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Nov 15 '24 13:11 openshift-ci[bot]

/test e2e-aws-tests-cache

Nov 15 '24 13:11 eslutsky

/retest

Nov 18 '24 09:11 eslutsky

/test e2e-aws-tests-arm

Nov 18 '24 11:11 eslutsky

squashed the extra commits

Nov 18 '24 15:11 eslutsky

/test test-rebase

Nov 18 '24 15:11 eslutsky