cloud-proxy-server never has an external IP assigned
Describe the bug
I'm deploying Pixie locally to a Colima cluster for testing and PoC purposes
Running ./dev_dns_updater seems to get stuck, so I checked the LoadBalancer services and I have a weird situation
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cloud-proxy-service LoadBalancer 10.43.209.160 <pending> 443:30758/TCP,4444:30058/TCP,5555:32671/TCP 5m16s
❯ kubectl get service vzconn-service -n plc
vzconn-service LoadBalancer 10.43.53.124 192.168.5.1 51600:31468/TCP 17d
As you can see vzconn-service worked fine and has an IP assigned, but for some reason cloud-proxy-service doesn't have it, which I think might be the root cause for the issue with dev_dns_updater
If both didn't have an IP I would assume that there is something wrong with the load balancer assignment, but if worked for one, why didn't work for the other?
I checked the pod events for the service and pod, but I don't see anything wrong there
Service
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 7m21s service-controller Ensuring load balancer
Normal AppliedDaemonSet 7m21s service-controller Applied LoadBalancer DaemonSet kube-system/svclb-cloud-proxy-service-80a58f80
Pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16m default-scheduler Successfully assigned plc/cloud-proxy-7897b497cb-sx82r to colima
Normal Pulled 16m kubelet Container image "gcr.io/pixie-oss/pixie-prod/cloud/proxy_server_image:0.1.7" already present on machine
Normal Created 16m kubelet Created container cloud-proxy-server
Normal Started 16m kubelet Started container cloud-proxy-server
Normal Pulled 16m kubelet Container image "envoyproxy/envoy:v1.12.2@sha256:b36ee021fc4d285de7861dbaee01e7437ce1d63814ead6ae3e4dfcad4a951b2e" already present on machine
Normal Created 16m kubelet Created container envoy
Normal Started 16m kubelet Started container envoy
The only thing I see are some warnings on the cloud-proxy-server container, but I don't think they are an issue:
2024/04/02 16:13:23 [warn] 8#8: could not build optimal variables_hash, you should increase either variables_hash_max_size: 1024 or variables_hash_bucket_size: 64; ignoring variables_hash_bucket_size
nginx: [warn] could not build optimal variables_hash, you should increase either variables_hash_max_size: 1024 or variables_hash_bucket_size: 64; ignoring variables_hash_bucket_size
Stream closed EOF for plc/cloud-proxy-7897b497cb-sx82r (cloud-proxy-server)
Any idea what could be preventing the service of getting an external IP?
To Reproduce Steps to reproduce the behavior:
- Install Pixie on Colima running locally
- See the cloud-proxy-server service never getting an external IP
Expected behavior The External IP is assigned to the cloud-proxy-server
App information (please complete the following information):
- Pixie version: 0.1.7
- K8s cluster version: v1.27.1+k3s1
- Node Kernel version
- Browser version
I have removed the tcp-https on the service cloud-proxy-service, leaving only the tcp-grpc and tcp-http2 ones, and then I get an IP assigned to it (not sure if it is the right thing to do, but vzconn-service also doesn't have one
❯ kubectl get service cloud-proxy-service -n plc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cloud-proxy-service LoadBalancer 10.43.209.160 192.168.5.1 4444:30058/TCP,5555:32671/TCP 43h
❯ kubectl get service vzconn-service -n plc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
vzconn-service LoadBalancer 10.43.53.124 192.168.5.1 51600:31468/TCP 19d
But still, ./dev_dns_updater gets stuck, but now shows a bit more of logging
INFO[0000] DNS Entries entries="dev.withpixie.dev, work.dev.withpixie.dev" service=cloud-proxy-service
INFO[0003] Update addr=192.168.5.1 service=cloud-proxy-service
I manually added the host to the hosts file:
192.168.5.1 dev.withpixie.dev work.dev.withpixie.dev
Which at least resolve the address, but the connection to the server fails with a timeout
❯ curl -vv dev.withpixie.dev:5555
* Trying 192.168.5.1:5555...
* connect to 192.168.5.1 port 5555 failed: Operation timed out
* Failed to connect to dev.withpixie.dev port 5555 after 75002 ms: Couldn't connect to server
* Closing connection
curl: (28) Failed to connect to dev.withpixie.dev port 5555 after 75002 ms: Couldn't connect to server
Anyone?
Any suggestion on how I can solve this?
Do you need more details?
anyone?
@JamesMBartlett
Hi @dcfranca.
It's hard for us to debug issues in environments we don't officially support.
I'm not too familiar with Colima. However, it seems like they have an option to enable exposing an external IP: https://github.com/abiosoft/colima/blob/main/docs/FAQ.md#the-virtual-machines-ip-is-not-reachable
Have you tried running colima with that flag?