ingress-nginx Dynamic reconfiguration failed, blocked by ModSecurity CRS

NGINX Ingress controller version 4.0.1 Kubernetes version 1.21

Environment: Baremetal, helm, with the following relevant values:

enable-modsecurity: "true"
enable-owasp-modsecurity-crs: "true"
modsecurity-snippet: |
  SecRuleEngine On

Also using cert-manager for automatic TLS certificate creation.

What happened:

Ingress-nginx can perform "dynamic reloads" by sending a POST /configuration/backends request to 127.0.0.1:10246, which is handled by Lua code.

For as far back as I have logs (30 days), ingress-nginx has apparently never performed "dynamic reconfiguration". It has exclusively performed the full backend reload from this line of code.

But today, ingress-nginx happened to perform a dynamic reconfiguration, which was blocked by ModSecurity CRS (since the request puts an IP address in the Host header). This caused ingress-nginx to be stuck in a loop, constantly reloading and failing, which used up all the RAM and caused cascading failures. The failures were only stopped after I added a ModSecurity rule exception that disabled ModSecurity for those internal requests to 127.0.0.1:10246.

What you expected to happen:

Ingress-nginx should not block requests to itself, either by having modsecurity disabled for the internal requests or shipping with some default rule exceptions.

How to reproduce it:

Good question. How can you trigger a dynamic reload as opposed to a full reload reliably? The comments in the code indicate that if you change a certificate or an L4 IP, it will skip the full reload and just do a dynamic one. But deleting a certificate secret, changing Endpoints, deleting endpoints, all triggers the full reload. And after a full reload, the dynamic reload is skipped.

The only way I can think to trigger a dynamic reload is to hope the configuration changes after the full reload but before the dynamic reload. Maybe this is due to a race condition?

Here are the logs around the event:

I0112 20:42:17.957691       7 store.go:371] "Found valid IngressClass" ingress="balhoff/cm-acme-http-solver-vv5ls" ingressclass="nginx"
I0112 20:42:17.957805       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"balhoff", Name:"cm-acme-http-solver-vv5ls", UID:"9c9b7c23-4524-4c10-9044-f6076e3d8dbe", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53881641", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0112 20:42:21.150264       7 controller.go:1047] Service "balhoff/cm-acme-http-solver-g6qrv" does not have any active Endpoint.
W0112 20:42:21.150416       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
I0112 20:42:21.150637       7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:42:21.523706       7 controller.go:169] "Backend successfully reloaded"
I0112 20:42:21.523851       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
I0112 20:42:21.792188       7 store.go:371] "Found valid IngressClass" ingress="wstephens/helx-nginx" ingressclass="nginx"
I0112 20:42:21.792311       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"wstephens", Name:"helx-nginx", UID:"31a7eb2f-9b21-49d8-b5d2-157d1abf9f00", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53881900", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0112 20:42:24.484049       7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:24.484202       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
I0112 20:42:24.484401       7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:42:24.850243       7 controller.go:169] "Backend successfully reloaded"
I0112 20:42:24.850386       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
I0112 20:42:26.584376       7 status.go:300] "updating Ingress status" namespace="balhoff" ingress="phenoscape-services-1-0-kb-services-service-ingress" currentValue=[] newValue=[{IP:152.54.15.132 Hostname: Ports:[]}]
I0112 20:42:26.584376       7 status.go:300] "updating Ingress status" namespace="balhoff" ingress="cm-acme-http-solver-vv5ls" currentValue=[] newValue=[{IP:152.54.15.132 Hostname: Ports:[]}]
I0112 20:42:26.584376       7 status.go:300] "updating Ingress status" namespace="wstephens" ingress="helx-nginx" currentValue=[] newValue=[{IP:152.54.15.132 Hostname: Ports:[]}]
I0112 20:42:26.590511       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"wstephens", Name:"helx-nginx", UID:"31a7eb2f-9b21-49d8-b5d2-157d1abf9f00", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53882009", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0112 20:42:26.590855       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"balhoff", Name:"phenoscape-services-1-0-kb-services-service-ingress", UID:"f5c40b1b-f8cb-4f8a-9992-0a9539469711", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53882010", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0112 20:42:26.590884       7 backend_ssl.go:46] Error obtaining X.509 certificate: no object matching key "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls" in local store
I0112 20:42:26.590968       7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"balhoff", Name:"cm-acme-http-solver-vv5ls", UID:"9c9b7c23-4524-4c10-9044-f6076e3d8dbe", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53882011", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0112 20:42:27.816720       7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:27.816858       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:31.149938       7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:31.150084       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:34.483420       7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:34.483606       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:37.817216       7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:37.817393       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:41.673417       7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:41.673603       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:45.007543       7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:45.007683       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:57.584197       7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:57.584365       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:04.317801       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:07.649716       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:10.983235       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:33.456137       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:36.790048       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:49.508637       7 controller.go:952] Error obtaining Endpoints for Service "balhoff/cm-acme-http-solver-g6qrv": no object matching key "balhoff/cm-acme-http-solver-g6qrv" in local store
W0112 20:43:49.508778       7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
I0112 20:43:49.509008       7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:43:49.876006       7 controller.go:169] "Backend successfully reloaded"
I0112 20:43:49.876133       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
I0112 20:43:50.778858       7 store.go:509] "Secret was added and it is used in ingress annotations. Parsing" secret="balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls"
I0112 20:43:50.779265       7 backend_ssl.go:66] "Adding secret to local store" name="balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls"
I0112 20:43:52.842949       7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:43:53.208385       7 controller.go:169] "Backend successfully reloaded"
I0112 20:43:53.208809       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/01/12 20:43:53 [error] 13549#13549: *277346 [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `8' ) [file "/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 8)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.2"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/configuration/servers"] [unique_id "1642020233"] [ref ""], client: 127.0.0.1, server: , request: "POST /configuration/servers HTTP/1.1", host: "127.0.0.1:10246"
W0112 20:43:54.000587       7 controller.go:198] Dynamic reconfiguration failed: unexpected error code: 403
E0112 20:43:54.000606       7 controller.go:202] Unexpected failure reconfiguring NGINX:
unexpected error code: 403
E0112 20:43:54.000618       7 queue.go:130] "requeuing" err="unexpected error code: 403" key="balhoff/cm-acme-http-solver-vv5ls"
I0112 20:43:56.176351       7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:43:56.537687       7 controller.go:169] "Backend successfully reloaded"
I0112 20:43:56.537940       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/01/12 20:43:57 [error] 13837#13837: *277405 [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `8' ) [file "/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 8)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.2"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/configuration/servers"] [unique_id "1642020237"] [ref ""], client: 127.0.0.1, server: , request: "POST /configuration/servers HTTP/1.1", host: "127.0.0.1:10246"
W0112 20:43:57.362644       7 controller.go:198] Dynamic reconfiguration failed: unexpected error code: 403
E0112 20:43:57.362662       7 controller.go:202] Unexpected failure reconfiguring NGINX:
unexpected error code: 403

Jan 13 '22 01:01 mac-chaffee

@mac-chaffee: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 13 '22 01:01 k8s-ci-robot

I ended up killing the ingress-nginx pods since once they get to ~99% RAM usage, new attempts to reload the configuration cause OOMKills. Interestingly, the newly-creating ingress-nginx pods also attempted to perform dynamic reloads. I would expect the initial boot to perform a full reload, then skip the dynamic reload:

W0112 20:59:49.756514       7 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0112 20:59:49.756713       7 main.go:221] "Creating API client" host="https://10.20.0.1:443"
I0112 20:59:49.768158       7 main.go:265] "Running in Kubernetes cluster" major="1" minor="21" git="v1.21.5" state="clean" commit="aea7bbadd2fc0cd689de94a54e5b7b758869d691" platform="linux/amd64"
I0112 20:59:49.937149       7 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0112 20:59:49.968052       7 ssl.go:531] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0112 20:59:49.988839       7 nginx.go:253] "Starting NGINX Ingress controller"
I0112 20:59:50.114235       7 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"e32530ea-840d-458c-99cc-6b8ce9d83fb9", APIVersion:"v1", ResourceVersion:"53456346", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller
I0112 20:59:51.198126       7 store.go:371] "Found valid IngressClass" ingress="translator/answercoalesce-dev-answer-coalesce-ingress" ingressclass="nginx"
...
...
...
I0112 20:59:51.293055       7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:59:51.571986       7 status.go:84] "New leader elected" identity="ingress-nginx-controller-jsprp"
I0112 20:59:51.807786       7 controller.go:169] "Backend successfully reloaded"
I0112 20:59:51.807911       7 controller.go:180] "Initial sync, sleeping for 1 second"
I0112 20:59:51.807997       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-tshmg", UID:"0e7abe10-9663-4456-aab4-0a990912f1b9", APIVersion:"v1", ResourceVersion:"53888198", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/01/12 20:59:53 [error] 214#214: *24 [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `8' ) [file "/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 8)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.2"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/configuration/servers"] [unique_id "1642021193"] [ref ""], client: 127.0.0.1, server: , request: "POST /configuration/servers HTTP/1.1", host: "127.0.0.1:10246"
W0112 20:59:53.581633       7 controller.go:198] Dynamic reconfiguration failed: unexpected error code: 403
E0112 20:59:53.581677       7 controller.go:202] Unexpected failure reconfiguring NGINX:
unexpected error code: 403
E0112 20:59:53.581705       7 queue.go:130] "requeuing" err="unexpected error code: 403" key="initial-sync"
I0112 20:59:54.627471       7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:59:55.176189       7 controller.go:169] "Backend successfully reloaded"
I0112 20:59:55.176283       7 controller.go:180] "Initial sync, sleeping for 1 second"
I0112 20:59:55.176310       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-tshmg", UID:"0e7abe10-9663-4456-aab4-0a990912f1b9", APIVersion:"v1", ResourceVersion:"53888198", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/01/12 20:59:56 [error] 470#470: *84 [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `8' ) [file "/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 8)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.2"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/configuration/servers"] [unique_id "1642021196"] [ref ""], client: 127.0.0.1, server: , request: "POST /configuration/servers HTTP/1.1", host: "127.0.0.1:10246"
W0112 20:59:56.970134       7 controller.go:198] Dynamic reconfiguration failed: unexpected error code: 403
E0112 20:59:56.970174       7 controller.go:202] Unexpected failure reconfiguring NGINX:
unexpected error code: 403

Jan 13 '22 01:01 mac-chaffee

@theunrealgeek, any comment on this

Jan 14 '22 08:01 longwuyuan

I experienced this issue a long time ago (more than 30 days), also fixed it by whitelisting 127.0.0.1:10246.

Jan 14 '22 10:01 besha100

Can the issue be closed then ?

Thanks, ; Long Wu Yuan

On 1/14/22 3:46 PM, Moh Basher wrote:

I experienced this issue a long time ago (more than 30 days), also fixed it by whitelisting 127.0.0.1:10246.

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/8137#issuecomment-1012987501, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWR3BAOP4ZSXR3JTDHTUV7ZZDANCNFSM5L2NGTUA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

Jan 14 '22 11:01 longwuyuan

Can the issue be closed then?

I would say that enabling modsecurity in "enforcing" mode seems to be an officially supported configuration (albeit not the default config): https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/modsecurity/

If this configuration is supported, it has a serious bug that can cause production outages that won't always appear during testing (since dynamic reloads appear to be rare). Whitelisting 127.0.0.1:10246 is a workaround, but I'd like to help future users of modsecurity avoid production outages by either:

making ingress-nginx whitelist 127.0.0.1:10246 by default,
documenting this edge case on the modsecurity page, or
figuring out why dynamic reloads appear to be so rare. It seems like there may also be a bug in the code that detects when a dynamic reload is possible

Jan 14 '22 14:01 mac-chaffee

There is work in progress, related to this.

We could mark this as import-longterm.

Any changes/PRs you have in mind, are good to discuss anytime. But implementation is better timed for future, after tha related work-in-progress is completed. There is thoughts about isolating plugins and other components from the core of the controller, roughly described. The idea is to use sidecars for components that can be singled out. That way the controller becomes more modular.

Thanks, ; Long Wu Yuan

On 1/14/22 8:19 PM, Mac Chaffee wrote:

Can the issue be closed then?
I would say that enabling modsecurity in "enforcing" mode seems to be an officially supported configuration (albeit not the default config): https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/modsecurity/ https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/modsecurity/

If this configuration is supported, it has a serious bug that can cause production outages that won't always appear during testing (since dynamic reloads appear to be rare). Whitelisting 127.0.0.1:10246 is a workaround, but I'd like to help future users of modsecurity avoid production outages by either:

making ingress-nginx whitelist 127.0.0.1:10246 by default,

documenting this edge case on the modsecurity page, or

figuring out why dynamic reloads appear to be so rare. It seems like there may also be a bug in the code that detects when a dynamic reload is possible

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/8137#issuecomment-1013191384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWR24VXZ6LNISKAZ3MTUWAZWJANCNFSM5L2NGTUA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

Jan 14 '22 15:01 longwuyuan

@mac-chaffee could you share here the Modsecurity rule that you used to whitelist this behavior. Just wanna make sure if we need to whitelist, this shouldn't be a generous rule as some hackers they use headers and stuff to make the requests appear as originated from 127.0.01

Jan 14 '22 15:01 besha100

I wrote this rule in a panic during the outage, so it could be improved:

SecRule REQUEST_HEADERS:Host "@streq 127.0.0.1:10246"
          "id:21029,phase:1,t:none,nolog,pass,ctl:ruleEngine=Off"

Since ingress-nginx uses the host header to route requests, applications are still mostly protected, but nginx itself or the default backend would be left without a firewall if an attacker set that exact host header on their requests.

Unfortunately, I'm unsure how to trigger "dynamic reloads" in my test environment so I haven't been able to test out a more targeted rule. Any insights into those dynamic reloads would be appreciated.

If I am able to trigger dynamic reloads, I'd be happy to submit a PR

Jan 14 '22 15:01 mac-chaffee

@mac-chaffee try to add configmap as extraVolumes and mount it. I think it may trigger dynamic reload

Jan 14 '22 17:01 besha100

@besha100 If you add a new configmap to the pod, the whole pod will restart. Do you mean make an edit to a configmap mounted in extraVolumes? I tried that, but nginx doesn't seem to reload its config at all for changes to configmaps other than the main fields. I added an extra field to the ingress-nginx-controller configmap and mounted that extra field as extraVolumes, but when you edit that, I just see:

I0115 18:44:29.064373       8 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"2e41ae29-ef92-4f69-aa67-d1aac8e5a119", APIVersion:"v1", ResourceVersion:"23930384", FieldPath:""}): 
    type: 'Normal' reason: 'UPDATE' ConfigMap ingress-nginx/ingress-nginx-controller

If I make an edit to any other field like modsecurity-snippet, it triggers a full reload:

I0115 18:46:28.607629       7 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"2e41ae29-ef92-4f69-aa67-d1
aac8e5a119", APIVersion:"v1", ResourceVersion:"23930815", FieldPath:""}): type: 'Normal' reason: 'UPDATE' ConfigMap ingress-nginx/ingress-nginx-controller
I0115 18:46:28.615198       7 controller.go:152] "Configuration changes detected, backend reload required"
152.54.15.250 - - [15/Jan/2022:18:46:29 +0000] "GET /.well-known/openid-configuration HTTP/2.0" 403 146 "-" "Go-http-client/2.0" 6 0.000 [dex-dex-5556] [] - - - - 4d3246d7ddbac2d
1f234cdc986cd66c9
I0115 18:46:29.832535       7 controller.go:169] "Backend successfully reloaded"
I0115 18:46:29.833500       7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-ghfcq", UID:"b62166b5-d140-498c-ad07-63
2a38fa9e38", APIVersion:"v1", ResourceVersion:"23924913", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration

Jan 15 '22 18:01 mac-chaffee

@mac-chaffee I meant making a configmap as extraVolumes, and then mounted it as a file in the pod. But include or overwrite it with file that has reference to in the nginx config. for example include the mounted file in the modsecurity-snippet as a file that has some security rules

Jan 16 '22 15:01 besha100

That's what I tried. I actually use that technique to get around the 4096 character limit for modsecurity-snippet:

controller:
  extraVolumeMounts:
  - name: renci-modsecurity-rules
    mountPath: /etc/nginx/owasp-modsecurity-crs/custom/
  extraVolumes:
  - name: renci-modsecurity-rules
    configMap:
      name: ingress-nginx-controller
      items:
      - key: long-modsecurity-snippet
        path: renci-modsecurity-rules.conf
  config:
    enable-modsecurity: "true"
    enable-owasp-modsecurity-crs: "true"
    modsecurity-snippet: |
      SecRuleEngine On
      Include /etc/nginx/owasp-modsecurity-crs/custom/renci-modsecurity-rules.conf
    long-modsecurity-snippet: |
      ...put custom rules here...

When I edit long-modsecurity-snippet, nginx does not reload its configuration at all. ingress-nginx-controller does notice the change and enqueues a task to see if it should reload config, but it skips the reload because my custom config long-modsecurity-snippet isn't in the Configuration struct.

Sounds like I need to edit backends, tcp/udp endpoints, or servers. But not 100% sure what those are or how to change them. Will look into it more

Jan 16 '22 19:01 mac-chaffee

Hi @mac-chaffee, I'm running into a similar issue and I used the solution as yours but added a comment above the Include with a version number to increase it whenever I modify the long-modsecurity-snippet. However, this trick doesn't seem to work all the time. Did you look into it more? How did you solve it?

Mar 29 '22 16:03 davidcodesido

That's what I do, and it works every time for me. Although I'm not running the latest version of ingress-nginx. A newer version changes how the custom rules are loaded which may break this method.

I'd recommend looking exactly at the generated nginx.conf file and the way that modsecurity handles config overrides, because it's not straightforward unfortunately.

Mar 29 '22 16:03 mac-chaffee

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jun 27 '22 16:06 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jul 27 '22 17:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Aug 26 '22 17:08 k8s-triage-robot

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Aug 26 '22 17:08 k8s-ci-robot

ingress-nginx ingress-nginx copied to clipboard

Dynamic reconfiguration failed, blocked by ModSecurity CRS

ingress-nginx
ingress-nginx copied to clipboard