ignite
ignite copied to clipboard
extremely slow vm networking?
Has anyone seen VM networking being very slow and flaky? This is with ignite 0.8.0.
Speed test (python speedtest-cli package) outside the VM is 1gig+
luke@phat:~$ /home/ubuntu/.local/bin/speedtest
Retrieving speedtest.net configuration...
Testing from OVH SAS (51.210.209.124)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by OVH Cloud (Gravelines) [5879.73 km]: 9.144 ms
Testing download speed................................................................................
Download: 1123.13 Mbit/s
Testing upload speed......................................................................................................
Upload: 615.87 Mbit/s
Speed test inside the VM is like 14mbit...
root@baeaca45b27e2576:~# speedtest
Retrieving speedtest.net configuration...
Testing from OVH SAS (51.210.209.124)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Eurafibre (Lille) [5956.08 km]: 34.197 ms
Testing download speed................................................................................
Download: 14.26 Mbit/s
Testing upload speed......................................................................................................
Upload: 17.47 Mbit/s
Possibly related, there are 100k+ files in /var/lib/cni. But I'm seeing networking flakiness and slowness even when I clean out /var/lib/cni. Starting VMs does speed up again when /var/lib/cni is cleared out though.
That's really peculiar. When the VM's "speed up again", how is the performance?
With just a small number of VM's on my WSL2 ignite host, I'm seeing effectively native bandwidth on my gigabit uplink using the CNI bridge.
Maybe @bboreham or CNI bridge-plugin maintainer would know how extensive usage could cause the host kernel to slow down WRT bridge networking?
Came back to this because I saw the issue again. There are a great many files in /var/lib/cni again, iptables is using a lot of CPU as the system adds new VMs, and iptables --list is slow and returns a lot of of results.
There are only 15 VMs running on this system. VMs are terminating normally using ignite rm. Their IP addresses are left over in the /var/lib/cni directory structure. Why isn't their networking being cleaned up?
This issue seems to be stopping simple things like git clone https://github.com/... working inside the VMs intermittently!
root@ns1003380:/var/lib/cni# iptables --list|wc -l
30295
root@ns1003380:/var/lib/cni# find . |wc -l
26516
root@ns1003380:/var/lib/cni# find . |head -n 10
.
./networks
./networks/ignite-cni-bridge
./networks/ignite-cni-bridge/10.61.32.102
./networks/ignite-cni-bridge/10.61.1.253
./networks/ignite-cni-bridge/10.61.22.246
./networks/ignite-cni-bridge/10.61.13.29
./networks/ignite-cni-bridge/10.61.23.196
./networks/ignite-cni-bridge/10.61.26.100
./networks/ignite-cni-bridge/10.61.9.211
root@ns1003380:/var/lib/cni# iptables --list|head -n 100
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy DROP)
target prot opt source destination
CNI-FORWARD all -- anywhere anywhere /* CNI firewall plugin rules */
DOCKER-USER all -- anywhere anywhere
DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain CNI-ADMIN (1 references)
target prot opt source destination
Chain CNI-FORWARD (1 references)
target prot opt source destination
CNI-ADMIN all -- anywhere anywhere /* CNI firewall plugin rules */
ACCEPT all -- anywhere 10.61.225.110 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.225.110 anywhere
ACCEPT all -- anywhere 10.61.225.111 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.225.111 anywhere
ACCEPT all -- anywhere 10.61.225.112 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.225.112 anywhere
ACCEPT all -- anywhere 10.61.225.113 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.225.113 anywhere
ACCEPT all -- anywhere 10.61.225.114 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.225.114 anywhere
ACCEPT all -- anywhere 10.61.225.115 ctstate RELATED,ESTABLISHED
[lots more like this]
root@ns1003380:/var/lib/cni# ignite version
Ignite version: version.Info{Major:"0", Minor:"8", GitVersion:"v0.8.0", GitCommit:"77f6859fa4f059f7338738e14cf66f5b9ec9b21c", GitTreeState:"clean", BuildDate:"2020-11-09T20:50:50Z", GoVersion:"go1.14.2", Compiler:"gc", Platform:"linux/amd64", SandboxImage:version.Image{Name:"weaveworks/ignite", Tag:"v0.8.0", Delimeter:":"}, KernelImage:version.Image{Name:"weaveworks/ignite-kernel", Tag:"4.19.125", Delimeter:":"}}
Firecracker version: v0.21.1
Inside VM:
root@7a899265f32c2013:~# speedtest
Retrieving speedtest.net configuration...
Testing from OVH SAS (51.81.244.112)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Sherwood Broadband (Sherwood, OR) [22.55 km]: 7.635 ms
Testing download speed................................................................................
Download: 101.72 Mbit/s
Testing upload speed......................................................................................................
Upload: 60.04 Mbit/s
Outside VM:
root@ns1003380:/var/lib/cni# /usr/local/bin/speedtest
Retrieving speedtest.net configuration...
Testing from OVH SAS (51.81.244.112)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Sherwood Broadband (Sherwood, OR) [22.55 km]: 2.823 ms
Testing download speed................................................................................
Download: 2531.19 Mbit/s
Testing upload speed......................................................................................................
Upload: 2350.14 Mbit/s
Any ideas @bboreham? Hi btw :-) :wave:
https://github.com/weaveworks/ignite/pull/442#issuecomment-533439114 indicates that iptables rules were once cleaned up, but I'm seeing them not being cleaned up on stop or rm:
root@ns1003380:~# ignite ps
VM ID IMAGE KERNEL SIZE CPUS MEMORY CREATED STATUS IPS PORTS NAME
8e4540b9a832a296 testfaster-image:b6d693c8c85646fd0b1e45583c4a2637e1e1fb2f-final quay.io/testfaster/ignite-kernel:latest 50.0 GB 4 16.0 GB 16s ago Up 16s 10.61.36.194 tfastpool-908423616798f6b97fac539e92ae239bf3ddf818b45128ed36d1d62a0dc97037-vm-c28f0ba2t2jv25vc3e90
ad091cdb9e5522b7 testfaster-image:b6d693c8c85646fd0b1e45583c4a2637e1e1fb2f-final quay.io/testfaster/ignite-kernel:latest 50.0 GB 4 16.0 GB 5s ago Up 5s 10.61.36.195 tfastpool-908423616798f6b97fac539e92ae239bf3ddf818b45128ed36d1d62a0dc97037-vm-c28f0bi2t2jv25vc3eb0
root@ns1003380:~# iptables --list |grep 10.61.36.194
Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
root@ns1003380:~# iptables --list |grep 10.61.36.194
ACCEPT all -- anywhere 10.61.36.194 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.36.194 anywhere
root@ns1003380:~# iptables --list |grep 10.61.36.195
ACCEPT all -- anywhere 10.61.36.195 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.36.195 anywhere
root@ns1003380:~# iptables --list |grep "10.61.36.194\|10.61.36.195"
ACCEPT all -- anywhere 10.61.36.194 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.36.194 anywhere
ACCEPT all -- anywhere 10.61.36.195 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.36.195 anywhere
root@ns1003380:~# ignite stop 8e4540b9a832a296
INFO[0000] Removing the container with ID "ignite-8e4540b9a832a296" from the "cni" network
INFO[0012] Stopped VM with name "tfastpool-908423616798f6b97fac539e92ae239bf3ddf818b45128ed36d1d62a0dc97037-vm-c28f0ba2t2jv25vc3e90" and ID "8e4540b9a832a296"
root@ns1003380:~# iptables --list |grep "10.61.36.194\|10.61.36.195"
ACCEPT all -- anywhere 10.61.36.194 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.36.194 anywhere
ACCEPT all -- anywhere 10.61.36.195 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.36.195 anywhere
root@ns1003380:~# ignite rm -f ad091cdb9e5522b7
INFO[0000] Removing the container with ID "ignite-ad091cdb9e5522b7" from the "cni" network
INFO[0002] Removed VM with name "tfastpool-908423616798f6b97fac539e92ae239bf3ddf818b45128ed36d1d62a0dc97037-vm-c28f0bi2t2jv25vc3eb0" and ID "ad091cdb9e5522b7"
root@ns1003380:~# iptables --list |grep "10.61.36.194\|10.61.36.195"
ACCEPT all -- anywhere 10.61.36.194 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.36.194 anywhere
ACCEPT all -- anywhere 10.61.36.195 ctstate RELATED,ESTABLISHED
ACCEPT all -- 10.61.36.195 anywhere
root@ns1003380:~# iptables-save |grep "10.61.36.194\|10.61.36.195"
-A CNI-FORWARD -d 10.61.36.194/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.61.36.194/32 -j ACCEPT
-A CNI-FORWARD -d 10.61.36.195/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.61.36.195/32 -j ACCEPT
-A POSTROUTING -s 10.61.36.194/32 -m comment --comment "name: \"ignite-cni-bridge\" id: \"c6d0f77a0e01c76e8f194590ed1e435bec584d494b2c4f8cffb2e724d786537e\"" -j CNI-2cb101d210755961201c5e71
-A POSTROUTING -s 10.61.36.195/32 -m comment --comment "name: \"ignite-cni-bridge\" id: \"1a75bd1dccb3a3b8e2a81c82ebab99fa7936819ee50b56164e11fc30d04d267f\"" -j CNI-1cfc4e25d76275bf9b32e5b5
root@ns1003380:~# /opt/cni/bin/bridge
CNI bridge plugin v0.8.5
same behaviour with newer CNI as well:
root@ns1003380:~# iptables-save |grep "10.61.36.231"
-A CNI-FORWARD -d 10.61.36.231/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.61.36.231/32 -j ACCEPT
-A POSTROUTING -s 10.61.36.231/32 -m comment --comment "name: \"ignite-cni-bridge\" id: \"72b2e7aaca38c4347f98bcaf9b1afbceec54d43d9c33dc21dc60030a000b132e\"" -j CNI-9933bc049210f7e454e72191
root@ns1003380:~# sudo ignite rm -f tfastpool-c7a784be464ec4544aa5501862310cca977ca1171769d535f3f364ed1fc99ead-vm-c28f19q2t2jplfecqnjg
INFO[0000] Removing the container with ID "ignite-0b65a0382f2dc7be" from the "cni" network
INFO[0001] Removed VM with name "tfastpool-c7a784be464ec4544aa5501862310cca977ca1171769d535f3f364ed1fc99ead-vm-c28f19q2t2jplfecqnjg" and ID "0b65a0382f2dc7be"
root@ns1003380:~# iptables-save |grep "10.61.36.231"
-A CNI-FORWARD -d 10.61.36.231/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.61.36.231/32 -j ACCEPT
-A POSTROUTING -s 10.61.36.231/32 -m comment --comment "name: \"ignite-cni-bridge\" id: \"72b2e7aaca38c4347f98bcaf9b1afbceec54d43d9c33dc21dc60030a000b132e\"" -j CNI-9933bc049210f7e454e72191
root@ns1003380:~# /opt/cni/bin/bridge
CNI bridge plugin v0.9.1
to see what was going on, I moved /opt/cni/bin/bridge to /opt/cni/bin/bridge.real and dropped this debug script into /opt/cni/bin/bridge
ubuntu@ns1003380:/opt/cni/bin$ cat bridge
#!/bin/bash
myvar=`cat`
(echo "Run with $@:"
env |grep CNI
echo "$myvar"
) >> /tmp/log
ret=$(echo "$myvar" | /opt/cni/bin/bridge.real "$@" 2>&1)
exitcode=$?
(echo "exit $exitcode"
echo "response: $ret"
echo
) >> /tmp/log
echo $ret
exit $exitcode
I am seeing both ADD and DEL commands:
Run with :
CNI_CONTAINERID=325a0d5d03717212114df55719b4518961051c4faf05001ccd54c9ce1e2d7dfd
CNI_IFNAME=eth0
CNI_NETNS=/proc/601842/ns/net
CNI_COMMAND=ADD
CNI_PATH=/opt/cni/bin
CNI_ARGS=
{"bridge":"ignite0","cniVersion":"0.4.0","ipMasq":true,"ipam":{"subnet":"10.61.0.0/16","type":"host-local"},"isDefaultGateway":true,"isGateway":true,"name":"ignite-cni-bridge","promiscMode":true,"type":"bridge"}
exit 0
response: {
"cniVersion": "0.4.0",
"interfaces": [
{
"name": "ignite0",
"mac": "ca:14:6d:b0:5d:1c"
},
{
"name": "veth75a84de0",
"mac": "e2:cb:04:21:a5:f4"
},
{
"name": "eth0",
"mac": "0e:ad:e8:bb:2e:77",
"sandbox": "/proc/601842/ns/net"
}
],
"ips": [
{
"version": "4",
"interface": 2,
"address": "10.61.1.22/16",
"gateway": "10.61.0.1"
}
],
"routes": [
{
"dst": "0.0.0.0/0",
"gw": "10.61.0.1"
}
],
"dns": {}
}
Run with :
CNI_CONTAINERID=ignite-22dd96eb9003a4b1
CNI_IFNAME=eth0
CNI_NETNS=/proc/595238/ns/net
CNI_COMMAND=DEL
CNI_PATH=/opt/cni/bin
CNI_ARGS=
{"bridge":"ignite0","cniVersion":"0.4.0","ipMasq":true,"ipam":{"subnet":"10.61.0.0/16","type":"host-local"},"isDefaultGateway":true,"isGateway":true,"name":"ignite-cni-bridge","promiscMode":true,"type":"bridge"}
exit 0
response:
this seems to be operating correctly, so my assumption is now that ignite is doing something with iptables rules itself that it's failing to clean up. I'm not sure though. I'm not sure why the CNI bridge plugin doesn't release the IP addresses, very many files in /var/run/cni still strike me as suspicious.
possibly related, i am running ignite run --runtime docker, i.e. using the legacy docker runtime (so that i can use docker images that are built locally by docker)
i guess actually we are using the firewall plugin in CNI to create the iptables rules that aren't being cleared up? and host-local plugin for IPAM?
adding instrumentation to firewall and host-local, looks like they all think they are succeeding, so why are we leaking IPs and iptables rules??
firewall:
CNI_CONTAINERID=ignite-4b79b9c398095e46
CNI_IFNAME=eth0
CNI_NETNS=/proc/743882/ns/net
CNI_COMMAND=DEL
CNI_PATH=/opt/cni/bin
CNI_ARGS=
{"cniVersion":"0.4.0","name":"ignite-cni-bridge","type":"firewall"}
exit 0
response:
bridge:
CNI_CONTAINERID=ignite-4b79b9c398095e46
CNI_IFNAME=eth0 CNI_NETNS=/proc/743882/ns/net
CNI_COMMAND=DEL
CNI_PATH=/opt/cni/bin
CNI_ARGS=
{"bridge":"ignite0","cniVersion":"0.4.0","ipMasq":true,"ipam":{"subnet":"10.61.0.0/16","type":"host-local"},"isDefaultGateway":true,"isGateway":true,"name":"ignite-cni-bridge","promiscMode":true,"type":"bridge"}
exit 0
response:
host-local:
CNI_CONTAINERID=ignite-abf40d722d788f33
CNI_IFNAME=eth0
CNI_NETNS=/proc/742360/ns/net
CNI_COMMAND=DEL
CNI_PATH=/opt/cni/bin
CNI_ARGS=
{"bridge":"ignite0","cniVersion":"0.4.0","ipMasq":true,"ipam":{"subnet":"10.61.0.0/16","type":"host-local"},"isDefaultGateway":true,"isGateway":true,"name":"ignite-cni-bridge","promiscMode":true,"type":"bridge"}
exit 0
response:
for reference:
ubuntu@ns1003380:/opt/cni/bin$ cat bridge
#!/bin/bash
myvar=`cat`
me=`basename "$0"`
(echo "$me:"
env |grep CNI
echo "$myvar"
) >> /tmp/log
ret=$(echo "$myvar" | /opt/cni/bin/$me.real "$@" 2>&1)
exitcode=$?
(echo "exit $exitcode"
echo "response: $ret"
echo
) >> /tmp/log
echo $ret
exit $exitcode
ubuntu@ns1003380:/opt/cni/bin$ ls -alh
total 71M
drwxrwxr-x 2 root root 4.0K May 4 08:14 .
drwxr-xr-x 3 root root 4.0K Dec 9 08:00 ..
-rwxr-xr-x 1 root root 4.0M Feb 5 15:42 bandwidth
-rwxr-xr-x 1 ubuntu root 258 May 4 08:13 bridge
-rwxr-xr-x 1 root root 4.4M May 4 07:36 bridge.real
-rwxr-xr-x 1 root root 9.8M Feb 5 15:42 dhcp
lrwxrwxrwx 1 root root 6 May 4 08:14 firewall -> bridge
-rwxr-xr-x 1 root root 4.6M May 4 08:14 firewall.real
-rwxr-xr-x 1 root root 3.3M Feb 5 15:42 flannel
-rwxr-xr-x 1 root root 4.0M Feb 5 15:42 host-device
lrwxrwxrwx 1 root root 6 May 4 08:14 host-local -> bridge
-rwxr-xr-x 1 root root 3.5M May 4 08:13 host-local.real
-rwxr-xr-x 1 root root 4.1M Feb 5 15:42 ipvlan
-rwxr-xr-x 1 root root 3.4M Feb 5 15:42 loopback
-rwxr-xr-x 1 root root 4.2M Feb 5 15:42 macvlan
-rwxr-xr-x 1 root root 3.8M Feb 5 15:42 portmap
-rwxr-xr-x 1 root root 4.3M Feb 5 15:42 ptp
-rwxr-xr-x 1 root root 3.6M Feb 5 15:42 sbr
-rwxr-xr-x 1 root root 3.1M Feb 5 15:42 static
-rwxr-xr-x 1 root root 3.5M Feb 5 15:42 tuning
-rwxr-xr-x 1 root root 4.1M Feb 5 15:42 vlan
-rwxr-xr-x 1 root root 3.6M Feb 5 15:42 vrf
I've worked around this for now by writing my own code which interacts with iptables and /var/lib/cni to do the cleanup that ignite + docker + CNI fails to do.
I'm not sure if ignite forces docker runtime to use CNI (it's not trivial) but wouldn't it make sense to use --runtime docker together with --network-plugin docker-bridge? In this case I don't see any stale entries in my iptables
--runtime docker together with --network-plugin docker-bridge worked when I ran it manually but mysteriously failed (couldn't ping VMs) when running it from the code which wraps ignite in our project
On Fri, May 7, 2021 at 9:56 AM Michael Kashin @.***> wrote:
I'm not sure if ignite forces docker runtime to use CNI (it's not trivial) but wouldn't it make sense to use --runtime docker together with --network-plugin docker-bridge? In this case I don't see any stale entries in my iptables
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/weaveworks/ignite/issues/827#issuecomment-834187351, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACATUROGL37MN62IADHIITTMOTLRANCNFSM43VHEY6Q .
-- lukemarsden.net @.*** @lmarsden https://twitter.com/lmarsden
not sure why you'd have a problem doing this via API, it should work the same way.
But as for the IPT leaking with docker runtime, I think I've found the issue -- IPT rules are setup by the CNI plugin using proper docker container ID as the "id" in IPT rule comments, however, when they are being removed, the call to RemoveContainerNetwork is made with vm.PrefixedID(), which is docker container Name, not ID. So you can try patching /pkg/operations/remove.go with the below to see if it helps:
- if err = removeNetworking(vm.PrefixedID(), vm.Spec.Network.Ports...); err != nil {
+ if err = removeNetworking(vm.Status.Runtime.ID, vm.Spec.Network.Ports...); err != nil {
thanks @networkop good spot. Any chance we could get this fix into a release please?
I guess this made it into https://github.com/weaveworks/ignite/releases/tag/v0.10.0?
@lukemarsden yes, it did https://github.com/weaveworks/ignite/commit/2f840ad44c39dcb31c53ce3865d28d15b162c90c .