kubernetes-pfsense-controller icon indicating copy to clipboard operation
kubernetes-pfsense-controller copied to clipboard

pfsense getting constant updates

Open hansaya opened this issue 3 years ago • 14 comments

Haproxy on pfsense keep getting reloaded, leading haproxy not being able to hold a connection.

As you can see from the logs bellow. Every second or so it goes and updates pfsense. What could I be doing wrong?

2022-02-23T22:40:28+00:00 plugin (haproxy-declarative): successfully reloaded HAProxy service
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071070
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071071
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071096
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/test/Ingress/mysite-ingress MODIFIED - 8070722
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070723
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/test/Ingress/mysite-ingress MODIFIED - 8070726
2022-02-23T22:40:31+00:00 plugin (haproxy-declarative): successfully reloaded HAProxy service
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071097
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071129
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071130
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070728
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070731
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070755
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/test/Ingress/mysite-ingress MODIFIED - 8070756
2022-02-23T22:40:33+00:00 plugin (haproxy-declarative): successfully reloaded HAProxy service
2022-02-23T22:40:33+00:00 plugin (haproxy-ingress-proxy): successfully reloaded HAProxy service
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071153
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071155
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071192
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070757
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/test/Ingress/mysite-ingress MODIFIED - 8070758
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070760

I have a "simple" setup currently with a rancher service and one test service. I'm running k3s v1.22.3+k3s1 with 3 servers and 3 agents in a HA config. For HA I'm using kube-vip and then using metallb for service load balancing. Finally traefik for ingress. Currently haproxy on pfsense doing certification management and SSL offloading. This issue seems to caused by k3s thinking there is a change then triggering this project to go update pfsense.

Let me know if you need more details about my setup.

hansaya avatar Feb 23 '22 22:02 hansaya

Welcome! Something is triggering constantly updates on the ingress and service which is pretty abnormal. I'm not familiar with kube-vip but based on a quick look my guess is that kube-vip and metallb are 'fighting' each other over who 'owns' the LoadBalancer IP address.

I would suggest running something like kubectl get svc -A | grep LoadBalancer under watch (or something like this kubectl get svc -A --watch) and see if the IP is flapping constantly.

travisghansen avatar Feb 23 '22 23:02 travisghansen

You are correct, I miss read the documentation https://kube-vip.chipzoller.dev/docs/installation/daemonset/ After removing metallb, no more DDOSing pfsense. However, I might have configured something wrong. k3s not picking up the VIP and this project one of the control node as the entry point. This defeats use of VIP for redundancy. I might mess around with turning off service option for kube-vip and try mteallb again on top kube-vip.

hansaya avatar Feb 24 '22 06:02 hansaya

Sounds good. Let me know how it goes. We’ll leave this open until we know everything is good.

travisghansen avatar Feb 24 '22 06:02 travisghansen

Update: I played with this lot more and removed kube-vip altogether from the equation. Looks like traefik causing Metallb handout a ip constantly. I haven't figure out why yet.

{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:23.060373128Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:23.066429678Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:23.06652935Z"}
{"caller":"level.go:63","error":"Operation cannot be fulfilled on services \"traefik\": the object has been modified; please apply your changes to the latest version and try again","level":"error","msg":"failed to update service status","op":"updateServiceStatus","service":"kube-system/traefik","ts":"2022-02-25T16:05:23.071830824Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:25.038477461Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:25.050305976Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:27.038608675Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:27.045857899Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:29.039585392Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:29.05151219Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:31.036852784Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:31.048244625Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:33.045624429Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:33.057657115Z"} 

What about using kube-vip for service load balancing as well? I managed to get this working but for whatever reason kube-vip not updating k3s about the ip address it handed out so I have to manually read the logs and update pfsense.

hansaya avatar Feb 25 '22 16:02 hansaya

Akube-vip plugin likely would not be very difficult to add. If you get it up and going and working as expected I'm happy to take a look.

travisghansen avatar Feb 25 '22 16:02 travisghansen

Found this https://rancher.com/docs/k3s/latest/en/networking/#disabling-the-service-lb so that was my issue and after I disabled servicelb everything worked as expected. Now time to dig into kube-vip

hansaya avatar Feb 26 '22 08:02 hansaya

Ah! So you had 3 things fighting over the ip…that’s a disaster for sure. Are you going to reinstall metallb now or stick with just kube-vip?

travisghansen avatar Feb 26 '22 13:02 travisghansen

Thanks for the help, I did mess around with kube-vip more but I cannot get it to play well. I'm going to leave it to that and use metallb. One last question, I have not seen any configuration option to use two or more shared frontends. I got two shared frontends configured in haproxy for WAN and LAN side. This helps me to do proper SSL without needing to expose all of the services to public. Any suggestions for this without running two instances of this project?

hansaya avatar Feb 27 '22 07:02 hansaya

Give me a bit more detail on the setup and desired outcome if you don’t mind and I’ll see if it’s possible.

travisghansen avatar Feb 27 '22 14:02 travisghansen

Sure, I got two frontends binded to two interfaces. WAN and LAN. This helps with separating rules for internal and public services.

similar to this

frontend shared-https-merged
	bind			xxx.xxx.xxx.131:443 namexxx.xxx.xxx.131:443   ssl crt-list /var/etc/haproxy/shared-https.crt_list  
	mode			http
	log			global
	option			httpclose
	timeout client		30000
	rspidel ^Server:.*$
	acl			aclcrt_shared-https	var(txn.txnhost) -m reg -i ^([^\.]*)\.example\.com(:([0-9]){1,5})?$
	acl			ACL1	var(txn.txnhost) -m str -i nextcloud.example.com
	acl			ACL11	var(txn.txnhost) -m str -i pass.example.com
	acl			ACL20	var(txn.txnhost) -m str -i ha.example.com
	use_backend nextcloud.example.com_ipvANY  if  ACL1 
	use_backend pass.example.com_ipvANY  if  ACL11 
	use_backend ha.example.com_ipvANY  if  ACL20 
	use_backend bad_backend_ipvANY  if   aclcrt_shared-https

frontend shared-https-local-merged
	bind			172.16.1.1:443 name 172.16.1.1:443   ssl crt-list /var/etc/haproxy/shared-https-local.crt_list  
	mode			http
	log			global
	option			httpclose
	option			forwardfor
	acl https ssl_fc
	http-request set-header		X-Forwarded-Proto http if !https
	http-request set-header		X-Forwarded-Proto https if https
	timeout client		30000
	rspidel ^Server:.*$
	acl			aclcrt_shared-https-local	var(txn.txnhost) -m reg -i ^([^\.]*)\.example\.com(:([0-9]){1,5})?$
	acl			ACL4	var(txn.txnhost) -m str -i home.example.com
	acl			ACL5	var(txn.txnhost) -m str -i unifi.example.com
	acl			ACL2	var(txn.txnhost) -m beg -i nextcloud.example.com
	acl			ACL14	var(txn.txnhost) -m str -i bi.example.com
	acl			ACL15	var(txn.txnhost) -m str -i primary.example.com
	acl			ACL16	var(txn.txnhost) -m str -i secondary.example.com
	acl			ACL17	var(txn.txnhost) -m str -i plex.example.com
	acl			ACL17	var(txn.txnhost) -m str -i plex.direct
	acl			ACL18	var(txn.txnhost) -m str -i syno.example.com
	acl			ACL19	var(txn.txnhost) -m str -i ha.example.com
	acl			ACLGRAFANA	var(txn.txnhost) -m str -i grafana.example.com
	acl			ACLESPHOME	var(txn.txnhost) -m str -i esphome.example.com
	acl			PASS_LOCAL_ACL	var(txn.txnhost) -m str -i pass.example.com
	use_backend home.example.com_ipvANY  if  ACL4 
	use_backend unifi.example.com_ipvANY  if  ACL5 
	use_backend nextcloud.example.com_ipvANY  if  ACL2 
	use_backend bi.example.com_ipvANY  if  ACL14 
	use_backend primary.example.com_ipvANY  if  ACL15 
	use_backend secondary.example.com_ipvANY  if  ACL16 
	use_backend plex.example.com_ipvANY  if  ACL17 
	use_backend synology.example.com_ipvANY  if  ACL18 
	use_backend ha.example.com_ipvANY  if  ACL19 
	use_backend grafana.example.com_ipvANY  if  ACLGRAFANA 
	use_backend esphome.example.com_ipvANY  if  ACLESPHOME 
	use_backend pass.example.com_ipvANY  if  PASS_LOCAL_ACL 
	use_backend bad_backend_ipvANY  if   aclcrt_shared-https-local

I do not want everything to be exposed to public, specially projects I'm currently working on. However it would be nice to have them work with proper SSL certificates. So I just need that particular project to be using the shared-https-local.

hansaya avatar Feb 27 '22 15:02 hansaya

I'm guessing you're referring to the haproxy-ingress-proxy feature. If so this bit from the README is probably what you're after:

Optionally, on the ingress resources you can set the following annotations: haproxy-ingress-proxy.pfsense.org/frontend and haproxy-ingress-proxy.pfsense.org/backend to respectively set the frontend and backend to override the defaults.

travisghansen avatar Feb 27 '22 15:02 travisghansen

Thats not going to solve the issue. it replaces the default. As you can see, my haproxy has duplicate entries for Local and Public. Why? This allows me to do different rules for local vs public. On top of that my public dns points to cloudflare and they have limits to bandwidth single file size limits etc... Having local traffic directly going to proxy entry point bypasses all of that.

hansaya avatar Feb 27 '22 18:02 hansaya

You want the same ingress to be added to 2 frontends?

travisghansen avatar Feb 27 '22 19:02 travisghansen

Yes

hansaya avatar Feb 27 '22 21:02 hansaya

I have this implemented. Anything else needed?

travisghansen avatar Feb 04 '23 01:02 travisghansen

How you go about doing it? Setting a default and setting a annotation(haproxy-ingress-proxy.pfsense.org/frontend) at the same time?

hansaya avatar Feb 04 '23 03:02 hansaya

I haven't committed it yet, but it just supports a comma-separated list instead of a single entry.

travisghansen avatar Feb 04 '23 04:02 travisghansen

Thank you so much. Let me know if you want me to test it.

hansaya avatar Feb 04 '23 05:02 hansaya

Released in v0.5.12.

travisghansen avatar Feb 04 '23 15:02 travisghansen

Any luck testing this out?

travisghansen avatar Feb 09 '23 13:02 travisghansen

I just tested it, works as expected. Thank you so much!

hansaya avatar Feb 09 '23 21:02 hansaya

hey just peeping this convo, I was wondering were you able to get this to work with kube-vip without metallb? should it just work with this controller- or is some additional integration required?

ashtonian avatar Feb 17 '23 22:02 ashtonian

Let’s open another issue for kube-vip support. Currently I think the only real dependency on metallb is it checks for the configmap (which is arbitrary, it doesn’t use any data from it). Some minor adjustments can be made and it should work fine with kube-vip as well.

travisghansen avatar Feb 19 '23 14:02 travisghansen

v0.5.13 has removed any need for metallb at all. The plugin is still named metallb (for now) but it simply manages bgp peer by pushing cluster nodes to pfSense.

travisghansen avatar Feb 23 '23 21:02 travisghansen