cloud-provider-openstack icon indicating copy to clipboard operation
cloud-provider-openstack copied to clipboard

[occm] Cannot access loadbalancer from non kubernetes nodes

Open sykim-etri opened this issue 2 years ago • 9 comments

Is this a BUG REPORT or FEATURE REQUEST?:

What happened: At first, I set up Openstack (xena) with octavia into openstack-server machine. I tested a loadbalancer is working well.

And then I installed kubernetes cluster(v1.22.8) by kubespray in 2 VMs which are created on above Openstack.

NAME            STATUS   ROLES                  AGE   VERSION
octavia-k8s-1   Ready    control-plane,master   18h   v1.22.8
octavia-k8s-2   Ready    <none>                 18h   v1.22.8

I configured kubernetes for occm (v1.22.1) and I applied several yaml files in ~/cloud-provider-openstack/manifests/controller-manager. (cloud-controller-manager-roles.yaml, openstack-cloud-controller-manager-ds.yaml, cloud-controller-manager-role-bindings.yaml kubeadm.conf, openstack-cloud-controller-manager-pod.yaml)

Finally, I applied ~/cloud-provider-openstack/examples/loadbalancers/external-http-nginx.yaml. And I got external ip for external-http-nginx-service successfully.

# kubectl get svc
NAME                          TYPE           CLUSTER-IP    EXTERNAL-IP       PORT(S)        AGE
external-http-nginx-service   LoadBalancer   10.233.45.4   1.2.1.242(edited)   80:31206/TCP   55s
kubernetes                    ClusterIP      10.233.0.1    <none>            443/TCP        19h

In kubernetes nodes, I can access the external-http-nginx-service by 1.2.1.242 well.

root@octavia-k8s-1:~# curl 1.2.1.242
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

BUT, in other nodes(VM) in same subnet or physical machine(openstack-server), I cannot get the result by 1.2.1.242.

ubuntu@openstack-server:~$ curl 1.2.1.242
curl: (52) Empty reply from server

I guess maybe I'm missing some configuration.

What you expected to happen: In other nodes(VM) in same subnet or openstack-server

How to reproduce it:

Anything else we need to know?: I installed openstack by kolla-ansible for xena.

kubernetes master's ip a result is as below:

root@octavia-k8s-1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:92:a7:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.64.0.7/24 brd 10.64.0.255 scope global dynamic ens3
       valid_lft 15733sec preferred_lft 15733sec
    inet6 fe80::f816:3eff:fe92:a7e6/64 scope link
       valid_lft forever preferred_lft forever
3: kube-ipvs0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether d6:c8:83:d0:eb:d7 brd ff:ff:ff:ff:ff:ff
    inet 10.233.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.233.0.3/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.233.45.4/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 1.2.1.242/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet6 fe80::d4c8:83ff:fed0:ebd7/64 scope link
       valid_lft forever preferred_lft forever
6: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1430 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.233.79.0/32 scope global tunl0
       valid_lft forever preferred_lft forever
7: calida9c852d892@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
8: cali15fc62ac7b1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
9: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether be:93:fd:e0:a3:a8 brd ff:ff:ff:ff:ff:ff
    inet 169.254.25.10/32 scope global nodelocaldns
       valid_lft forever preferred_lft forever
11: cali7b3550237a5@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1430 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever

openstack-server's ip route is as below:

ubuntu@openstack-server:~$ ip route
default via 1.2.1.1 dev eno3 onlink
10.64.0.0/24 via 1.2.1.244 dev eno3
10.254.0.0/24 via 1.2.1.244 dev eno3
1.2.1.0/24 dev eno3 proto kernel scope link src 1.2.1.235
1.2.1.240/28 via 1.2.1.244 dev eno3

Environment:

  • openstack-cloud-controller-manager(or other related binary) version: v1.22.1
  • OpenStack version: xena
  • Others: ubuntu, kolla-ansible

sykim-etri avatar May 20 '22 07:05 sykim-etri

This is weird.. LB is a VM with pre-installed haproxy by default and the ip 1.2.1.214 should be the ip of the LB I knew we had a sec group fix recently https://github.com/kubernetes/cloud-provider-openstack/issues/1830 but not sure it's related ,as you can curl from one machine but not the other seems related to firewall..

are you able to check any logs in OCCM logs and see any thing suspcious?

jichenjc avatar May 20 '22 10:05 jichenjc

@jichenjc Thanks for your comment.

This is full OCCM log. I use latest occm version with log level 4.

occm-latest.log

In this log, I think this warning is suspicious. But I don't know what it means.--;

W0520 20:30:23.777141       1 openstack.go:325] Failed to create an OpenStack Secret client: unable to initialize keymanager client for region RegionOne: No suitable endpoint could be found in the service catalog.
root@octavia-k8s-1:~/cloud-provider-openstack/examples/loadbalancers# k get svc
NAME                          TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)        AGE
external-http-nginx-service   LoadBalancer   10.233.10.134   1.2.1.249   80:31217/TCP   7m28s
kubernetes                    ClusterIP      10.233.0.1      <none>            443/TCP        32h

sykim-etri avatar May 20 '22 20:05 sykim-etri

@jichenjc This is more detailed log(level 7).

occm-log.txt

sykim-etri avatar May 22 '22 13:05 sykim-etri

Failed to create an OpenStack Secret client: unable to initialize keymanager client for region RegionOne: No suitable endpoint could be found in the service catalog.

this is ok, as it only tell us it's not able to find barican service in your catalog which is optional .

jichenjc avatar May 23 '22 12:05 jichenjc

the log provided seems truncated ,and nothing special until I0522 13:46:34.568918 which is last log I can see..

have you tcpdump on the VM to anything wrong there? beyond my expertise now ... not sure someone else has background?

jichenjc avatar May 23 '22 12:05 jichenjc

@jichenjc Thanks for your comment.

Do you know any openstack(with octavia) installation documents as line by line? I'll try a clean install.

I guess my network configuration may be wrong.

sykim-etri avatar May 23 '22 12:05 sykim-etri

I think https://docs.openstack.org/devstack/latest/guides/devstack-with-lbaas-v2.html might be the easiest way to go..

jichenjc avatar May 23 '22 13:05 jichenjc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 21 '22 13:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 20 '22 13:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Oct 20 '22 14:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Oct 20 '22 14:10 k8s-ci-robot