ingress-nginx
ingress-nginx copied to clipboard
Dynamic reconfiguration failed, blocked by ModSecurity CRS
NGINX Ingress controller version 4.0.1 Kubernetes version 1.21
Environment: Baremetal, helm, with the following relevant values:
enable-modsecurity: "true"
enable-owasp-modsecurity-crs: "true"
modsecurity-snippet: |
SecRuleEngine On
Also using cert-manager for automatic TLS certificate creation.
What happened:
Ingress-nginx can perform "dynamic reloads" by sending a POST /configuration/backends
request to 127.0.0.1:10246
, which is handled by Lua code.
For as far back as I have logs (30 days), ingress-nginx has apparently never performed "dynamic reconfiguration". It has exclusively performed the full backend reload from this line of code.
But today, ingress-nginx happened to perform a dynamic reconfiguration, which was blocked by ModSecurity CRS (since the request puts an IP address in the Host header). This caused ingress-nginx to be stuck in a loop, constantly reloading and failing, which used up all the RAM and caused cascading failures. The failures were only stopped after I added a ModSecurity rule exception that disabled ModSecurity for those internal requests to 127.0.0.1:10246
.
What you expected to happen:
Ingress-nginx should not block requests to itself, either by having modsecurity disabled for the internal requests or shipping with some default rule exceptions.
How to reproduce it:
Good question. How can you trigger a dynamic reload as opposed to a full reload reliably? The comments in the code indicate that if you change a certificate or an L4 IP, it will skip the full reload and just do a dynamic one. But deleting a certificate secret, changing Endpoints, deleting endpoints, all triggers the full reload. And after a full reload, the dynamic reload is skipped.
The only way I can think to trigger a dynamic reload is to hope the configuration changes after the full reload but before the dynamic reload. Maybe this is due to a race condition?
Here are the logs around the event:
I0112 20:42:17.957691 7 store.go:371] "Found valid IngressClass" ingress="balhoff/cm-acme-http-solver-vv5ls" ingressclass="nginx"
I0112 20:42:17.957805 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"balhoff", Name:"cm-acme-http-solver-vv5ls", UID:"9c9b7c23-4524-4c10-9044-f6076e3d8dbe", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53881641", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0112 20:42:21.150264 7 controller.go:1047] Service "balhoff/cm-acme-http-solver-g6qrv" does not have any active Endpoint.
W0112 20:42:21.150416 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
I0112 20:42:21.150637 7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:42:21.523706 7 controller.go:169] "Backend successfully reloaded"
I0112 20:42:21.523851 7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
I0112 20:42:21.792188 7 store.go:371] "Found valid IngressClass" ingress="wstephens/helx-nginx" ingressclass="nginx"
I0112 20:42:21.792311 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"wstephens", Name:"helx-nginx", UID:"31a7eb2f-9b21-49d8-b5d2-157d1abf9f00", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53881900", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0112 20:42:24.484049 7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:24.484202 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
I0112 20:42:24.484401 7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:42:24.850243 7 controller.go:169] "Backend successfully reloaded"
I0112 20:42:24.850386 7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
I0112 20:42:26.584376 7 status.go:300] "updating Ingress status" namespace="balhoff" ingress="phenoscape-services-1-0-kb-services-service-ingress" currentValue=[] newValue=[{IP:152.54.15.132 Hostname: Ports:[]}]
I0112 20:42:26.584376 7 status.go:300] "updating Ingress status" namespace="balhoff" ingress="cm-acme-http-solver-vv5ls" currentValue=[] newValue=[{IP:152.54.15.132 Hostname: Ports:[]}]
I0112 20:42:26.584376 7 status.go:300] "updating Ingress status" namespace="wstephens" ingress="helx-nginx" currentValue=[] newValue=[{IP:152.54.15.132 Hostname: Ports:[]}]
I0112 20:42:26.590511 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"wstephens", Name:"helx-nginx", UID:"31a7eb2f-9b21-49d8-b5d2-157d1abf9f00", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53882009", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0112 20:42:26.590855 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"balhoff", Name:"phenoscape-services-1-0-kb-services-service-ingress", UID:"f5c40b1b-f8cb-4f8a-9992-0a9539469711", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53882010", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0112 20:42:26.590884 7 backend_ssl.go:46] Error obtaining X.509 certificate: no object matching key "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls" in local store
I0112 20:42:26.590968 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"balhoff", Name:"cm-acme-http-solver-vv5ls", UID:"9c9b7c23-4524-4c10-9044-f6076e3d8dbe", APIVersion:"networking.k8s.io/v1", ResourceVersion:"53882011", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W0112 20:42:27.816720 7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:27.816858 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:31.149938 7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:31.150084 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:34.483420 7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:34.483606 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:37.817216 7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:37.817393 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:41.673417 7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:41.673603 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:45.007543 7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:45.007683 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:42:57.584197 7 controller.go:1047] Service "wstephens/helx-nginx" does not have any active Endpoint.
W0112 20:42:57.584365 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:04.317801 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:07.649716 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:10.983235 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:33.456137 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:36.790048 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
W0112 20:43:49.508637 7 controller.go:952] Error obtaining Endpoints for Service "balhoff/cm-acme-http-solver-g6qrv": no object matching key "balhoff/cm-acme-http-solver-g6qrv" in local store
W0112 20:43:49.508778 7 controller.go:1270] Error getting SSL certificate "balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls": local SSL certificate balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls was not found. Using default certificate
I0112 20:43:49.509008 7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:43:49.876006 7 controller.go:169] "Backend successfully reloaded"
I0112 20:43:49.876133 7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
I0112 20:43:50.778858 7 store.go:509] "Secret was added and it is used in ingress annotations. Parsing" secret="balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls"
I0112 20:43:50.779265 7 backend_ssl.go:66] "Adding secret to local store" name="balhoff/phenoscape-kb-services-1-0.apps.renci.org-tls"
I0112 20:43:52.842949 7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:43:53.208385 7 controller.go:169] "Backend successfully reloaded"
I0112 20:43:53.208809 7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/01/12 20:43:53 [error] 13549#13549: *277346 [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `8' ) [file "/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 8)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.2"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/configuration/servers"] [unique_id "1642020233"] [ref ""], client: 127.0.0.1, server: , request: "POST /configuration/servers HTTP/1.1", host: "127.0.0.1:10246"
W0112 20:43:54.000587 7 controller.go:198] Dynamic reconfiguration failed: unexpected error code: 403
E0112 20:43:54.000606 7 controller.go:202] Unexpected failure reconfiguring NGINX:
unexpected error code: 403
E0112 20:43:54.000618 7 queue.go:130] "requeuing" err="unexpected error code: 403" key="balhoff/cm-acme-http-solver-vv5ls"
I0112 20:43:56.176351 7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:43:56.537687 7 controller.go:169] "Backend successfully reloaded"
I0112 20:43:56.537940 7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-jsprp", UID:"d32bee4d-7f81-41d4-98ca-b88caec6c7db", APIVersion:"v1", ResourceVersion:"53755667", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/01/12 20:43:57 [error] 13837#13837: *277405 [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `8' ) [file "/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 8)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.2"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/configuration/servers"] [unique_id "1642020237"] [ref ""], client: 127.0.0.1, server: , request: "POST /configuration/servers HTTP/1.1", host: "127.0.0.1:10246"
W0112 20:43:57.362644 7 controller.go:198] Dynamic reconfiguration failed: unexpected error code: 403
E0112 20:43:57.362662 7 controller.go:202] Unexpected failure reconfiguring NGINX:
unexpected error code: 403
@mac-chaffee: This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
I ended up killing the ingress-nginx pods since once they get to ~99% RAM usage, new attempts to reload the configuration cause OOMKills. Interestingly, the newly-creating ingress-nginx pods also attempted to perform dynamic reloads. I would expect the initial boot to perform a full reload, then skip the dynamic reload:
W0112 20:59:49.756514 7 client_config.go:615] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0112 20:59:49.756713 7 main.go:221] "Creating API client" host="https://10.20.0.1:443"
I0112 20:59:49.768158 7 main.go:265] "Running in Kubernetes cluster" major="1" minor="21" git="v1.21.5" state="clean" commit="aea7bbadd2fc0cd689de94a54e5b7b758869d691" platform="linux/amd64"
I0112 20:59:49.937149 7 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0112 20:59:49.968052 7 ssl.go:531] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0112 20:59:49.988839 7 nginx.go:253] "Starting NGINX Ingress controller"
I0112 20:59:50.114235 7 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"e32530ea-840d-458c-99cc-6b8ce9d83fb9", APIVersion:"v1", ResourceVersion:"53456346", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller
I0112 20:59:51.198126 7 store.go:371] "Found valid IngressClass" ingress="translator/answercoalesce-dev-answer-coalesce-ingress" ingressclass="nginx"
...
...
...
I0112 20:59:51.293055 7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:59:51.571986 7 status.go:84] "New leader elected" identity="ingress-nginx-controller-jsprp"
I0112 20:59:51.807786 7 controller.go:169] "Backend successfully reloaded"
I0112 20:59:51.807911 7 controller.go:180] "Initial sync, sleeping for 1 second"
I0112 20:59:51.807997 7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-tshmg", UID:"0e7abe10-9663-4456-aab4-0a990912f1b9", APIVersion:"v1", ResourceVersion:"53888198", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/01/12 20:59:53 [error] 214#214: *24 [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `8' ) [file "/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 8)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.2"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/configuration/servers"] [unique_id "1642021193"] [ref ""], client: 127.0.0.1, server: , request: "POST /configuration/servers HTTP/1.1", host: "127.0.0.1:10246"
W0112 20:59:53.581633 7 controller.go:198] Dynamic reconfiguration failed: unexpected error code: 403
E0112 20:59:53.581677 7 controller.go:202] Unexpected failure reconfiguring NGINX:
unexpected error code: 403
E0112 20:59:53.581705 7 queue.go:130] "requeuing" err="unexpected error code: 403" key="initial-sync"
I0112 20:59:54.627471 7 controller.go:152] "Configuration changes detected, backend reload required"
I0112 20:59:55.176189 7 controller.go:169] "Backend successfully reloaded"
I0112 20:59:55.176283 7 controller.go:180] "Initial sync, sleeping for 1 second"
I0112 20:59:55.176310 7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-tshmg", UID:"0e7abe10-9663-4456-aab4-0a990912f1b9", APIVersion:"v1", ResourceVersion:"53888198", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
2022/01/12 20:59:56 [error] 470#470: *84 [client 127.0.0.1] ModSecurity: Access denied with code 403 (phase 2). Matched "Operator `Ge' with parameter `5' against variable `TX:ANOMALY_SCORE' (Value: `8' ) [file "/etc/nginx/owasp-modsecurity-crs/rules/REQUEST-949-BLOCKING-EVALUATION.conf"] [line "80"] [id "949110"] [rev ""] [msg "Inbound Anomaly Score Exceeded (Total Score: 8)"] [data ""] [severity "2"] [ver "OWASP_CRS/3.3.2"] [maturity "0"] [accuracy "0"] [tag "application-multi"] [tag "language-multi"] [tag "platform-multi"] [tag "attack-generic"] [hostname "127.0.0.1"] [uri "/configuration/servers"] [unique_id "1642021196"] [ref ""], client: 127.0.0.1, server: , request: "POST /configuration/servers HTTP/1.1", host: "127.0.0.1:10246"
W0112 20:59:56.970134 7 controller.go:198] Dynamic reconfiguration failed: unexpected error code: 403
E0112 20:59:56.970174 7 controller.go:202] Unexpected failure reconfiguring NGINX:
unexpected error code: 403
@theunrealgeek, any comment on this
I experienced this issue a long time ago (more than 30 days), also fixed it by whitelisting 127.0.0.1:10246.
Can the issue be closed then ?
Thanks, ; Long Wu Yuan
On 1/14/22 3:46 PM, Moh Basher wrote:
I experienced this issue a long time ago (more than 30 days), also fixed it by whitelisting 127.0.0.1:10246.
— Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/8137#issuecomment-1012987501, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWR3BAOP4ZSXR3JTDHTUV7ZZDANCNFSM5L2NGTUA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
Can the issue be closed then?
I would say that enabling modsecurity in "enforcing" mode seems to be an officially supported configuration (albeit not the default config): https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/modsecurity/
If this configuration is supported, it has a serious bug that can cause production outages that won't always appear during testing (since dynamic reloads appear to be rare). Whitelisting 127.0.0.1:10246 is a workaround, but I'd like to help future users of modsecurity avoid production outages by either:
- making ingress-nginx whitelist 127.0.0.1:10246 by default,
- documenting this edge case on the modsecurity page, or
- figuring out why dynamic reloads appear to be so rare. It seems like there may also be a bug in the code that detects when a dynamic reload is possible
There is work in progress, related to this.
We could mark this as import-longterm.
Any changes/PRs you have in mind, are good to discuss anytime. But implementation is better timed for future, after tha related work-in-progress is completed. There is thoughts about isolating plugins and other components from the core of the controller, roughly described. The idea is to use sidecars for components that can be singled out. That way the controller becomes more modular.
Thanks, ; Long Wu Yuan
On 1/14/22 8:19 PM, Mac Chaffee wrote:
Can the issue be closed then?
I would say that enabling modsecurity in "enforcing" mode seems to be an officially supported configuration (albeit not the default config): https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/modsecurity/ https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/modsecurity/
If this configuration is supported, it has a serious bug that can cause production outages that won't always appear during testing (since dynamic reloads appear to be rare). Whitelisting 127.0.0.1:10246 is a workaround, but I'd like to help future users of modsecurity avoid production outages by either:
- making ingress-nginx whitelist 127.0.0.1:10246 by default,
- documenting this edge case on the modsecurity page, or
- figuring out why dynamic reloads appear to be so rare. It seems like there may also be a bug in the code that detects when a dynamic reload is possible
— Reply to this email directly, view it on GitHub https://github.com/kubernetes/ingress-nginx/issues/8137#issuecomment-1013191384, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGZVWR24VXZ6LNISKAZ3MTUWAZWJANCNFSM5L2NGTUA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
@mac-chaffee could you share here the Modsecurity rule that you used to whitelist this behavior. Just wanna make sure if we need to whitelist, this shouldn't be a generous rule as some hackers they use headers and stuff to make the requests appear as originated from 127.0.01
I wrote this rule in a panic during the outage, so it could be improved:
SecRule REQUEST_HEADERS:Host "@streq 127.0.0.1:10246"
"id:21029,phase:1,t:none,nolog,pass,ctl:ruleEngine=Off"
Since ingress-nginx uses the host header to route requests, applications are still mostly protected, but nginx itself or the default backend would be left without a firewall if an attacker set that exact host header on their requests.
Unfortunately, I'm unsure how to trigger "dynamic reloads" in my test environment so I haven't been able to test out a more targeted rule. Any insights into those dynamic reloads would be appreciated.
If I am able to trigger dynamic reloads, I'd be happy to submit a PR
@mac-chaffee try to add configmap as extraVolumes and mount it. I think it may trigger dynamic reload
@besha100 If you add a new configmap to the pod, the whole pod will restart. Do you mean make an edit to a configmap mounted in extraVolumes? I tried that, but nginx doesn't seem to reload its config at all for changes to configmaps other than the main fields. I added an extra field to the ingress-nginx-controller
configmap and mounted that extra field as extraVolumes, but when you edit that, I just see:
I0115 18:44:29.064373 8 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"2e41ae29-ef92-4f69-aa67-d1aac8e5a119", APIVersion:"v1", ResourceVersion:"23930384", FieldPath:""}):
type: 'Normal' reason: 'UPDATE' ConfigMap ingress-nginx/ingress-nginx-controller
If I make an edit to any other field like modsecurity-snippet
, it triggers a full reload:
I0115 18:46:28.607629 7 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"2e41ae29-ef92-4f69-aa67-d1
aac8e5a119", APIVersion:"v1", ResourceVersion:"23930815", FieldPath:""}): type: 'Normal' reason: 'UPDATE' ConfigMap ingress-nginx/ingress-nginx-controller
I0115 18:46:28.615198 7 controller.go:152] "Configuration changes detected, backend reload required"
152.54.15.250 - - [15/Jan/2022:18:46:29 +0000] "GET /.well-known/openid-configuration HTTP/2.0" 403 146 "-" "Go-http-client/2.0" 6 0.000 [dex-dex-5556] [] - - - - 4d3246d7ddbac2d
1f234cdc986cd66c9
I0115 18:46:29.832535 7 controller.go:169] "Backend successfully reloaded"
I0115 18:46:29.833500 7 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-ghfcq", UID:"b62166b5-d140-498c-ad07-63
2a38fa9e38", APIVersion:"v1", ResourceVersion:"23924913", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
@mac-chaffee I meant making a configmap as extraVolumes, and then mounted it as a file in the pod. But include or overwrite it with file that has reference to in the nginx config. for example include the mounted file in the modsecurity-snippet as a file that has some security rules
That's what I tried. I actually use that technique to get around the 4096 character limit for modsecurity-snippet
:
controller:
extraVolumeMounts:
- name: renci-modsecurity-rules
mountPath: /etc/nginx/owasp-modsecurity-crs/custom/
extraVolumes:
- name: renci-modsecurity-rules
configMap:
name: ingress-nginx-controller
items:
- key: long-modsecurity-snippet
path: renci-modsecurity-rules.conf
config:
enable-modsecurity: "true"
enable-owasp-modsecurity-crs: "true"
modsecurity-snippet: |
SecRuleEngine On
Include /etc/nginx/owasp-modsecurity-crs/custom/renci-modsecurity-rules.conf
long-modsecurity-snippet: |
...put custom rules here...
When I edit long-modsecurity-snippet
, nginx does not reload its configuration at all. ingress-nginx-controller does notice the change and enqueues a task to see if it should reload config, but it skips the reload because my custom config long-modsecurity-snippet
isn't in the Configuration struct.
Sounds like I need to edit backends, tcp/udp endpoints, or servers. But not 100% sure what those are or how to change them. Will look into it more
Hi @mac-chaffee, I'm running into a similar issue and I used the solution as yours but added a comment above the Include with a version number to increase it whenever I modify the long-modsecurity-snippet. However, this trick doesn't seem to work all the time. Did you look into it more? How did you solve it?
That's what I do, and it works every time for me. Although I'm not running the latest version of ingress-nginx. A newer version changes how the custom rules are loaded which may break this method.
I'd recommend looking exactly at the generated nginx.conf file and the way that modsecurity handles config overrides, because it's not straightforward unfortunately.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue or PR with
/reopen
- Mark this issue or PR as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.