ubridge
ubridge copied to clipboard
docker move_to_ns issue (GNS3 2.2.19 - Gentoo Linux)
Hi guys,
I am experiencing some problems while trying to bring up a docker container within GNS3. After running gns3server with '-d' I found the output:
2021-03-24 19:01:48 ERROR route.py:242 Uncaught exception detected: <class 'KeyError'> Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/gns3server/compute/base_node.py", line 631, in _ubridge_send await self._ubridge_hypervisor.send(command) File "/usr/lib/python3.8/site-packages/gns3server/utils/asyncio/init.py", line 163, in wrapper return await f(oself, *args, **kwargs) File "/usr/lib/python3.8/site-packages/gns3server/ubridge/ubridge_hypervisor.py", line 259, in send raise UbridgeError(data[-1][4:]) gns3server.ubridge.ubridge_error.UbridgeError: could not complete netlink transaction
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 914, in _add_ubridge_connection await self._ubridge_send('docker move_to_ns {ifc} {ns} eth{adapter}'.format(ifc=adapter.host_ifc, File "/usr/lib/python3.8/site-packages/gns3server/compute/base_node.py", line 633, in _ubridge_send raise UbridgeError("Error while sending command '{}': {}: {}".format(command, e, self._ubridge_hypervisor.read_stdout())) gns3server.ubridge.ubridge_error.UbridgeError: Error while sending command 'docker move_to_ns tap-gns3-e0 27347 eth0': could not complete netlink transaction: uBridge version 0.9.18 running with libpcap version 1.10.0 (with TPACKET_V3) Hypervisor TCP control server started (IP 0.0.0.0 port 36283).
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 478, in start await self._add_ubridge_connection(nio, adapter_number) File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 918, in _add_ubridge_connection raise UbridgeNamespaceError(e) gns3server.ubridge.ubridge_error.UbridgeNamespaceError: Error while sending command 'docker move_to_ns tap-gns3-e0 27347 eth0': could not complete netlink transaction: uBridge version 0.9.18 running with libpcap version 1.10.0 (with TPACKET_V3) Hypervisor TCP control server started (IP 0.0.0.0 port 36283).
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3.8/site-packages/gns3server/web/route.py", line 198, in control_schema await func(request, response) File "/usr/lib/python3.8/site-packages/gns3server/handlers/api/compute/docker_handler.py", line 89, in start await container.start() File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 484, in start logdata = await self._get_log() File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/docker_vm.py", line 1141, in _get_log result = await self.manager.query("GET", "containers/{}/logs".format(self._cid), params={"stderr": 1, "stdout": 1}) File "/usr/lib/python3.8/site-packages/gns3server/compute/docker/init.py", line 114, in query if response.headers['CONTENT-TYPE'] == 'application/json': KeyError: 'CONTENT-TYPE'
I've executed ubridge hypervisor mode and accessed it by telnet. The docker move_to_ns command does not work in my environment. Please, could someone help to fix this?
Thanks in advance Regards
Have you check you have the correct capabilities/rights?
getcap /usr/local/bin/ubridge
/usr/local/bin/ubridge = cap_net_admin,cap_net_raw+ep
If not, set using this command:
setcap cap_net_admin,cap_net_raw=ep /usr/local/bin/ubridge
Hello @grossmj,
Thanks for answering.
The output seems a bit different:
gentoolinux /home/gab # getcap /usr/bin/ubridge /usr/bin/ubridge cap_net_admin,cap_net_raw=ep
The differences seem related only to output format, is it ok?
The differences seem related only to output format, is it ok?
It looks fine.
The error gns3server.ubridge.ubridge_error.UbridgeError: could not complete netlink transaction
indicates that uBridge cannot use netlink which is strange.
I am not a Gentoo expert but maybe the Kernel wasn't compiled with Netlink support or something similar? https://packages.gentoo.org/useflags/netlink
The link you provided is related to the USE flags. USE flags work like a toggle about some package feature or support. Gentoo users can choose the package feature will be enabled or disabled based on those flags while compiling. Within Gentoo the ubridge package has only "filecaps" as USE flag and I keep it enabled: https://packages.gentoo.org/packages/net-misc/ubridge
Regarding kernel support, I filtered my kernel config and found the output below:
gab@gentoolinux ~ $ cat /usr/src/linux/.config | grep -i netlink
CONFIG_NETFILTER_NETLINK=y
# CONFIG_NETFILTER_NETLINK_ACCT is not set
# CONFIG_NETFILTER_NETLINK_QUEUE is not set
CONFIG_NETFILTER_NETLINK_LOG=y
# CONFIG_NETFILTER_NETLINK_OSF is not set
CONFIG_NF_CT_NETLINK=y
# CONFIG_NETFILTER_NETLINK_GLUE_CT is not set
# CONFIG_NETLINK_DIAG is not set
CONFIG_QUOTA_NETLINK_INTERFACE=y
Please let me know I should enable any flag is not enabled at all. For sake of completeness I am able to bring up Alpine Linux as a container within GNS3 without any concerns. docker move_to_ns commands seems to work properly while configuring Alpine =/
Hello @grossmj
I'm maintaining the gns3 packages at gentoo and i was looking into this problem for some time now. Actually i don't really have an idea were the problem come from. However i was playing around with other distros as well and I could reproduce this problem with opensuse tumbleweed too. Ubuntu on the other hand doesn't suffer from it.
Checking the packages from both distros I saw that the Qt version is different. While gentoo and opensuse are already on 5.15, ubuntu still uses 5.14, which is why i was wondering if the Qt version could be the issue here?
Checking the packages from both distros I saw that the Qt version is different. While gentoo and opensuse are already on 5.15, ubuntu still uses 5.14, which is why i was wondering if the Qt version could be the issue here?
I doubt Qt has anything to do with it. The move_to_ns command basically moves an interface to a Linux namespace: https://github.com/GNS3/ubridge#docker-module-docker
I am still suspecting something isn't enabled or any other kind of restrictions, maybe checking this could help: https://wiki.gentoo.org/wiki/Docker#Kernel
Also, please try to manually create a network namespace and add a veth pair like this:
ip netns add test
ip netns list
ip link add veth0 type veth peer name veth1
ip link set veth1 netns test
ip netns exec test ip link list
There is a problem with netlink if you get any RTNETLINK errors.
This would help to isolate the issue. Thanks :+1:
Hi @grossmj,
No issues while running the commands you provided:
gentoolinux /home/gab # ip netns add test gentoolinux /home/gab # ip netns list test gentoolinux /home/gab # ip link add veth0 type veth peer name veth1 gentoolinux /home/gab # ip link set veth1 netns test gentoolinux /home/gab # ip netns exec test ip link list 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: ip6_vti0@NONE: <NOARP> mtu 1364 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/tunnel6 :: brd :: permaddr da35:2ea2:4c2f:: 3: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/sit 0.0.0.0 brd 0.0.0.0 4: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/tunnel6 :: brd :: permaddr 6a0f:abdf:9b04:: 7: veth1@if8: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 46:da:0f:a9:72:13 brd ff:ff:ff:ff:ff:ff link-netnsid 0
Thanks, this must mean there is nothing wrong with netlink itself.
Let's try to use uBridge to manually add an interface to a Docker container.
1 - Start a Docker container
docker run -it --rm alpine /bin/ash
2 - Find the Pid of the container
Now that the container is running, we need it's ID.
$ docker container list
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
00ae1b2479b7 alpine "/bin/ash" 7 minutes ago Up 6 minutes interesting_lehmann
Then we can use the container ID to find the Pid
$ docker container inspect 00ae1b2479b7 | grep Pid
"Pid": 308036,
"PidMode": "",
"PidsLimit": null,
3 - Run uBridge in hypervisor mode
Start uBridge to listen on port 4242 (with the same user you would use to run the GNS3 server).
$ ubridge -H 4242
uBridge version 0.9.19 running with libpcap version 1.9.1 (with TPACKET_V3)
Hypervisor TCP control server started (port 4242).
4 - Create a TAP interface and move it to Docker container
Then use telnet to connect to port 4242 and issue commands (replace the container Pid 308036 by the one from your container):
$ telnet localhost 4242
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
bridge create test
100-bridge 'test' created
bridge add_nio_tap test tap4242
100-NIO TAP added to bridge 'test'
docker move_to_ns tap4242 308036 eth42
100-tap4242 moved to namespace 308036
Now I expect you would get the error right after you enter the docker move_to_ns
command. Please try again by running uBridge with root to see if this is because of a permission issue. Thanks for your help!
Hi,
i've just tried the steps on my system to see what the problem is. Unfortunately i didn't got any error back when running the command on the cli.
For you information, this problem seems to happen only with certain docker images. While, for example, alpine
works without problems, the docker image ehlers/ostinato
suffers from this issue.
I've tried now both docker images with the commands you provided, but none of them gave any erros out:
alpine:
ai@x1 ~ $ telnet localhost 4242
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
bridge create test
100-bridge 'test' created
bridge add_nio_tap test tap4242
100-NIO TAP added to bridge 'test'
docker move_to_ns tap4242 25229 eth42
100-tap4242 moved to namespace 25229
ehlers/ostinato:
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
bridge create test
100-bridge 'test' created
bridge add_nio_tap test tap4242
100-NIO TAP added to bridge 'test'
docker move_to_ns tap4242 26169 eth42
100-tap4242 moved to namespace 26169
I even checked in the docker image if the interface was really there:
root@03e52a8fac78:/# ip ad
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
20: eth0@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
22: eth42: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 16:cf:27:36:47:71 brd ff:ff:ff:ff:ff:ff
root@03e52a8fac78:/#
Seems to be fine. Trying again with gns3
however turns out the problem still seems to be there..
The docker move_to_ns error message is also shown, when during the setup of the bridge the docker container dies. So your issue might have nothing to do with ubridge.
I suggest to have a look at the logs of the docker container. Use docker ps -a
to find out the container id, then use docker logs
to view the log.
behlers@iMac:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cb62deaff675 alpine-be:latest "/gns3/init.sh /etc/…" 22 seconds ago Exited (1) 16 seconds ago gifted_carson
behlers@iMac:~$ docker logs -t cb62deaff675
2021-10-29T15:02:57.755529052Z standard_init_linux.go:219: exec user process caused: exec format error
2021-10-29T15:02:59.383432634Z standard_init_linux.go:219: exec user process caused: exec format error
behlers@iMac:~$
In my example the container dies early with "exec format error". Even though this error has nothing to do with an ubridge issue, I get the same log messages as you with all that ubridge stuff.
I'm currently investigating the same behaviour on Debian 11.5 & podman (instead of Docker, but exposing the same interface through the appropriate socket).
The GNS3 server erroneously sends back the container's stdout as error message to the GNS3 client, but further investigation into the logs show that the containers do come up, but ubridge fails during docker move_to_ns
. Subsequently, GNS3 then kills the containers.
The capabilities are set correctly on the ubridge binary, and I've confirmed that it makes no difference whether the ubridge hypervisor runs under a normal user or root.
Interestingly I've encountered the kernel message "A link change request failed with some changes committed already." during troubleshooting, which may help to pinpoint what exactly the cause is here, but I don't yet have a reliable repro for that.
Edit: I should highlight that in my case, I do indeed get an error immediately after move_to_ns
when manually talking to a ubridge hypervisor.
Update: Upon further investigating, it seems that GNS3 is attempting the move_to_ns
with renaming the interface to eth0
. But this interface already exists in the container. I'm not sure wether this is specific to podman.