ovn-docker icon indicating copy to clipboard operation
ovn-docker copied to clipboard

OVN/OVS not working docker on RHEL 7.3

Open vijaymann opened this issue 8 years ago • 8 comments

I am using docker (version 1.10.3) with OVS (version 2.5.90) on RHEL 7.2.

I followed all the instructions on https://github.com/shettyg/ovn-docker/blob/master/docs/INSTALL.Docker.md

I have 3 nodes (one with IP $CENTRAL_IP and 2 others where I will spawn containers with IP $LOCAL_IP). I started docker daemon on all 3 nodes with consul as the distributed key-value store.

I first compiled openvswitch as per instructions given here (I am on RHEL 7.2) https://github.com/openvswitch/ovs/blob/master/INSTALL.Fedora.md

I started ovs on all nodes (3 in total) :
/usr/share/openvswitch/scripts/ovs-ctl start

On the central node (with IP $CENTRAL_IP), I executed the following two commands: ovs-appctl -t ovsdb-server ovsdb-server/add-remote ptcp:6640 /usr/share/openvswitch/scripts/ovn-ctl start_northd

On the compute nodes (where i spawn containers) with IP $LOCAL_IP, I executed the following commands: ovs-vsctl set Open_vSwitch . external_ids:ovn-remote="tcp:$CENTRAL_IP:6641"
external_ids:ovn-encap-ip=$LOCAL_IP external_ids:ovn-encap-type="geneve"
(note that i had to use the port 6641 and not 6640 as in your instructions: with 6640, I was getting an error while executing the docker network create command)_

After that, I started the controller and the overlay driver on the compute nodes:

/usr/share/openvswitch/scripts/ovn-ctl start_controller

ovn-docker-overlay-driver --detach

All commands worked fine. I use OVS kernel module. I could see openvswitch kernel module loaded (when i do lsmod) and all other daemons were getting started properly.

After that I could create a logical network GREEN as follows: docker network create -d openvswitch --subnet=192.168.1.0/24 GREEN

I had two containers (container1 on compute node1, and container2 on compute node2), and i tried to connect them to this new logical network GREEN

docker network connect GREEN container1 (on compute node1)

docker network connect GREEN container2 (on compute node2)

Everything works fine till this point. However, when I try to ping container2 from container1, I get a "Destination Host Unreachable". It looks like the southbound database is not getting populated properly (output of *ovn-sbctl show * command is empty), while northbound database shows all the logical switches and ports properly.

If i create another network (lets call it RED) using overlay as the driver, and connect the same containers (after disconnecting them from network GREEN) to RED, everything works fine. Am i missing something (I've spent about a week trying to debug this, without much luck).

vijaymann avatar Apr 06 '16 15:04 vijaymann

Are you using OVS master? If so, can you please use OVS 2.5?

shettyg avatar Apr 06 '16 16:04 shettyg

I see that you are using OVS 2.5.90. That is the master. Please use branch 2.5

shettyg avatar Apr 06 '16 16:04 shettyg

I tried using branch 2.5 (it created openvswitch-2.5.1-1.el7.x86_64.rpm and other rpms). I still can't get ping to work. One of my compute nodes has kernel 3.10.0-327.13.1.el7.x86_64 and this one shows a different output for lsmod: lsmod |grep openvswitch openvswitch 236670 3 nf_defrag_ipv6 34768 1 openvswitch nf_defrag_ipv4 12729 2 openvswitch,nf_conntrack_ipv4 nf_conntrack 105737 6 openvswitch,nf_nat,nf_nat_ipv4,xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4 gre 13796 1 openvswitch libcrc32c 12644 3 xfs,dm_persistent_data,openvswitch

The other node has kernel version 3.10.0-327.13.1.el7.x86_64 and the output for lsmod is as follows: lsmod |grep openvswitch openvswitch 84535 0 libcrc32c 12644 3 xfs,dm_persistent_data,openvswitch

I do see a single entry into the southbound database (ovs-sbctl show). However still can't get ping to work. Any other pointers on what could be going wrong ?

vijaymann avatar Apr 06 '16 21:04 vijaymann

Couple of concerns.

  1. Since 'lsmod | grep openvswitch' does not show "geneve" or "stt", I wonder whether you are using the kernel module from the upstream linux kernel instead of the one from the openvswitch repo. Geneve made it to upstream Linux kernel only in Linux 3.18, if you use upstream linux kernel module, pings will not work.
  2. 'ovn-sbctl show ' showing a single entry is a point of concern. What did it show? If ovn-controller is running on both the nodes, with the valid ovn-remote set, it should ideally show 2 chassis entries.

One way for you try this out to start with is to (on a mac):

git clone https://github.com/shettyg/ovn-docker cd ovn-docker/vagrant_overlay

Edit the file: https://github.com/shettyg/ovn-docker/blob/master/vagrant_overlay/install-ovn.sh#L10 to add the following additional line: git checkout -b branch-2.5_local origin/branch-2.5

And then add the following additional line here: https://github.com/shettyg/ovn-docker/blob/master/vagrant_overlay/install-ovn.sh#L19 modprobe gre

vagrant up node1 vagrant up node2

If you get an error (sometimes vagrant bails when ubuntu updates don't work), you should do 'vagrant destroy' and redo the 'vagrant up' commands. To make sure that your vagrant has been setup correctly, you should follow the instructions in: https://github.com/shettyg/ovn-docker/blob/master/vagrant_overlay/Readme.md

You can also simply ignore the vagrant and follow the instructions in https://github.com/shettyg/ovn-docker/blob/master/vagrant_overlay/Vagrantfile . i.e., run consul-server.sh, install-docker.sh and install-ovn.sh on node1 and consul-client.sh, install-docker.sh and install-ovn.sh on node2

shettyg avatar Apr 07 '16 00:04 shettyg

I'm using ovs_version: "2.5.1" and i have the same problem. ping command work successfully for the container on the same host, but it doesn't work for containers running on two different physical host. @vijaymann on wich host do you set 192.168.1.1 gateway address of GREEN network?

I followed the instruction reported in http://openvswitch.org/support/dist-docs-2.5/INSTALL.Docker.md.html. i would like to understand if docker daemon command must be performed on the central node too or just on the host agent where i spawn containers? And why in the instructions there is 127.0.0.1:8500? if the consul server is running on other host i have to replace 127.0.0.1 with the correct ip address?

uultimoo avatar Apr 07 '16 07:04 uultimoo

@vijaymann i found the problem. I resolved with re installing OVS branch2.5 following this and reloading geneve kernel module. In this way i can successfully ping container on different hosts too. Let me know if you have other problems....

uultimoo avatar Apr 07 '16 14:04 uultimoo

The OVS master should work now too. The documentation has been updated here: https://github.com/openvswitch/ovs/blob/master/INSTALL.Docker.md

The vagrant should also work now with just a "vagrant up" from vagrant_overlay directory.

shettyg avatar Apr 07 '16 17:04 shettyg

@shettyg master doesn't work for me even now. I think the vsctl command you've in your documentation is still not correct. I'll try the other option you gave (but slowly getting tired getting this thing to work!)

vijaymann avatar Apr 08 '16 03:04 vijaymann