autoscaling
autoscaling copied to clipboard
Debugging tools more easily available in compute pods and VMs
Problem description / Motivation
Debugging tools missing in compute VMs and not easily installable when VM is at memory limit. And because pod uses Alpine and the VM Debian we can't simply copy tools and libs from the pod into the VM.
Feature idea(s) / DoD
During INC-415 it would have helped a lot to have network debugging tools readily available in at least the pod and possibly in the VM.
Implementation ideas
I understand we don't want to have too many things exposed in the VM by default, so I would be good if one can easily install debug tooling ad-hoc. This could be a tarball in the pod that unpacked inside the VM when a script in the pod is run.
Ideally, we would have a pre-built image used with kubectl debug containing everything we need.
From @kelvich: fixing #1304 might provide another route for this issue
that was helpful:
apt update
apt install -y tcpdump screen iproute2 dnsutils iputils-ping lsof strace
Right, I should have mentioned that sometimes it's not possible to install tools when VM is too loaded:
$ apt install iproute2 tcpdump
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
dbus libapparmor1 libatm1 libbpf0 libbsd0 libcap2 libcap2-bin libdbus-1-3 libmd0 libmnl0 libpam-cap libpcap0.8
libxtables12
Suggested packages:
default-dbus-session-bus | dbus-session-bus iproute2-doc apparmor
The following NEW packages will be installed:
dbus iproute2 libapparmor1 libatm1 libbpf0 libbsd0 libcap2 libcap2-bin libdbus-1-3 libmd0 libmnl0 libpam-cap libpcap0.8
libxtables12 tcpdump
0 upgraded, 15 newly installed, 0 to remove and 10 not upgraded.
Need to get 0 B/2556 kB of archives.
After this operation, 7033 kB of additional disk space will be used.
Do you want to continue? [Y/n]
FATAL -> Failed to fork.
By the way, do we know what exactly causes Failed to fork? I am looking at the VM and it doesn't look "too loaded":
root@compute-billowing-waterfall-w2v36sgm-nhzfg:~# uptime
10:41:58 up 2:44, 0 users, load average: 0.48, 0.57, 0.54
root@compute-billowing-waterfall-w2v36sgm-nhzfg:~# free -h
total used free shared buff/cache available
Mem: 914Mi 467Mi 154Mi 32Mi 292Mi 400Mi
Swap: 1.0Gi 0B 1.0Gi
root@compute-billowing-waterfall-w2v36sgm-nhzfg:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 20G 1.4G 18G 8% /
devtmpfs 456M 0 456M 0% /dev
shm-tmpfs 40G 1.1M 40G 1% /dev/shm
/dev/vdb 50K 50K 0 100% /neonvm/runtime
/dev/vdc 40K 40K 0 100% /mnt/ssh
/dev/vde 35G 6.7M 33G 1% /neonvm/cache
/dev/vdf 196G 19M 186G 1% /var/db/postgres/compute
root@compute-billowing-waterfall-w2v36sgm-nhzfg:~# apt install dnsutils
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
bind9-dnsutils bind9-host bind9-libs libbsd0 libedit2 libfstrm0 liblmdb0 libmaxminddb0 libmd0 libuv1
Suggested packages:
mmdb-bin
FATAL -> Failed to fork.
Sorry, by "too loaded" I meant too high memory use. I didn't check exactly how high it was at the time. I don't have prod access right now, maybe check cgroup limits too.
[65164.881136] __vm_enough_memory: pid: 32509, comm: apt-get, not enough memory for the allocation
This issue was moved to Jira: LKB-1095