lima
lima copied to clipboard
Guest agent sometimes becomes unavailable from the host
Description
I've found this abiosoft/colima/issues/994 and it looks exactly like mine.
After sometime the created VM started to loose connection to guestagent due to missing ga.sock.
journalctl -e -u lima-guestagent.service -f
-- Boot 3c14c00ca1b045588ca9d3a840437668 --
Mar 02 15:34:31 lima-cri-tools systemd[1]: Started lima-guestagent.
Mar 02 15:34:33 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:34:33+03:00" level=info msg="event tick: 3s"
Mar 02 15:34:33 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:34:33+03:00" level=info msg="serving the guest agent on \"/run/lima-guestagent.sock\""
Mar 02 15:34:36 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:34:36+03:00" level=debug msg="fixSystemTimeSkew: rtc=2024-03-02T12:34:36Z systime=2024-03-02T15:34:36+03:00 delta=-150.01032ms"
Mar 02 15:34:46 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:34:46+03:00" level=debug msg="fixSystemTimeSkew: rtc=2024-03-02T12:34:46Z systime=2024-03-02T15:34:46+03:00 delta=-152.18532ms"
Mar 02 15:34:56 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:34:56+03:00" level=debug msg="fixSystemTimeSkew: rtc=2024-03-02T12:34:56Z systime=2024-03-02T15:34:56+03:00 delta=-155.54332ms"
Mar 02 15:35:06 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:35:06+03:00" level=debug msg="fixSystemTimeSkew: rtc=2024-03-02T12:35:06Z systime=2024-03-02T15:35:06+03:00 delta=-151.786742ms"
Mar 02 15:35:16 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:35:16+03:00" level=debug msg="fixSystemTimeSkew: rtc=2024-03-02T12:35:16Z systime=2024-03-02T15:35:16+03:00 delta=-146.093647ms"
Mar 02 15:35:21 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:35:21+03:00" level=debug msg="setWorthCheckingIPTablesRoutine(): setting to true"
Mar 02 15:35:21 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:35:21+03:00" level=debug msg="setWorthCheckingIPTablesRoutine(): setting to true"
Mar 02 15:35:21 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:35:21+03:00" level=debug msg="setWorthCheckingIPTablesRoutine(): setting to true"
Mar 02 15:35:21 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:35:21+03:00" level=debug msg="setWorthCheckingIPTablesRoutine(): setting to true"
Mar 02 15:35:21 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:35:21+03:00" level=debug msg="setWorthCheckingIPTablesRoutine(): setting to true"
Mar 02 15:35:26 lima-cri-tools lima-guestagent[1418]: time="2024-03-02T15:35:26+03:00" level=debug msg="fixSystemTimeSkew: rtc=2024-03-02T12:35:26Z systime=2024-03-02T15:35:26+03:00 delta=-149.117589ms"
Mar 02 15:35:27 lima-cri-tools systemd[1]: Stopping lima-guestagent...
Mar 02 15:35:27 lima-cri-tools systemd[1]: lima-guestagent.service: Deactivated successfully.
Mar 02 15:35:27 lima-cri-tools systemd[1]: Stopped lima-guestagent.
Mar 02 15:35:27 lima-cri-tools systemd[1]: Started lima-guestagent.
Mar 02 15:35:28 lima-cri-tools lima-guestagent[2425]: time="2024-03-02T15:35:28+03:00" level=info msg="event tick: 3s"
Mar 02 15:35:28 lima-cri-tools lima-guestagent[2425]: time="2024-03-02T15:35:28+03:00" level=info msg="serving the guest agent on \"/run/lima-guestagent.sock\""
Mar 02 15:35:34 lima-cri-tools lima-guestagent[2425]: time="2024-03-02T15:35:34+03:00" level=debug msg="LocalPorts(): worthCheckingIPTables=true"
Mar 02 15:35:34 lima-cri-tools lima-guestagent[2425]: time="2024-03-02T15:35:34+03:00" level=debug msg="Get info %+v{[{0.0.0.0 22} {127.0.0.1 35967} {127.0.0.53 53} {:: 22}]}"
The logs above shows that the lima-guestagent.service has been restarted (🤔 not sure the reason). This leads to the missing connection from the host:
WARN[0110] [hostagent] connection to the guest agent was closed unexpectedly
Missing connection leads to ga.sock deleting.
I tried to comment ga.sock deleting, and it appears to be a fix.
diff
diff --git a/pkg/hostagent/hostagent.go b/pkg/hostagent/hostagent.go
index 92c5073..ee4c9a3 100644
--- a/pkg/hostagent/hostagent.go
+++ b/pkg/hostagent/hostagent.go
@@ -605,7 +605,7 @@ func (a *HostAgent) watchGuestAgentEvents(ctx context.Context) {
}
}
} else {
- logrus.WithError(err).Warn("connection to the guest agent was closed unexpectedly", err)
+ logrus.WithError(err).Warn("connection to the guest agent was closed unexpectedly\n", err)
}
select {
case <-ctx.Done():
@@ -728,13 +728,14 @@ func forwardSSH(ctx context.Context, sshConfig *ssh.SSHConfig, port int, local,
}
} else {
logrus.Infof("Forwarding %q (guest) to %q (host)", remote, local)
- if err := os.RemoveAll(local); err != nil {
- logrus.WithError(err).Warnf("Failed to clean up %q (host) before setting up forwarding", local)
- }
- }
- if err := os.MkdirAll(filepath.Dir(local), 0o750); err != nil {
- return fmt.Errorf("can't create directory for local socket %q: %w", local, err)
+ logrus.Infof("Delete files %q (local)", local)
+ // if err := os.RemoveAll(local); err != nil {
+ // logrus.WithError(err).Warnf("Failed to clean up %q (host) before setting up forwarding", local)
+ // }
}
hostagent reconnect and everything is working again.
limactl start crio-tools
.....
INFO[0106] [hostagent] Waiting for the guest agent to be running
INFO[0106] [hostagent] Forwarding "/run/lima-guestagent.sock" (guest) to "/Users/user/.lima/cri-tools/ga.sock" (host)
INFO[0106] [hostagent] Delete files "/Users/user/.lima/cri-tools/ga.sock" (local)
WARN[0106] [hostagent] Failed to set up forward from "/run/lima-guestagent.sock" (guest) to "/Users/user/.lima/cri-tools/ga.sock" (host)
WARN[0106] [hostagent] connection to the guest agent was closed unexpectedly
dial unix /Users/user/.lima/cri-tools/ga.sock: connect: no such file or directory
INFO[0116] [hostagent] Forwarding "/run/lima-guestagent.sock" (guest) to "/Users/user/.lima/cri-tools/ga.sock" (host)
INFO[0116] [hostagent] Delete files "/Users/user/.lima/cri-tools/ga.sock" (local)
INFO[0116] [hostagent] Guest agent is running
INFO[0116] [hostagent] Waiting for the final requirement 1 of 1: "boot scripts must have finished"
INFO[0116] [hostagent] Not forwarding TCP [::]:22
INFO[0116] [hostagent] Not forwarding TCP 127.0.0.53:53
INFO[0116] [hostagent] Forwarding TCP from 127.0.0.1:44569 to 127.0.0.1:44569
INFO[0116] [hostagent] Not forwarding TCP 0.0.0.0:22
INFO[0116] [hostagent] The final requirement 1 of 1 is satisfied
INFO[0116] READY. Run `limactl shell cri-tools` to open the shell.
I see two options here (the second one is better, I guess, but I can't find the reason):
- In case of missing connection to the
guestagenttrying to reconnect and don't deletega.sock - Find a reason why
lima-guestagent.serviceis failing.
lima.yml
vmType: null
os: null
arch: x86_64
images:
- location: "https://cloud-images.ubuntu.com/releases/22.04/release-20240125/ubuntu-22.04-server-cloudimg-amd64.img"
arch: "x86_64"
digest: "sha256:f23e5b91af7f52bdd4680d8e9d091093bfc6c4a46b0da7baa3cb450ee917b66b"
cpus: 6
memory: 8GiB
disk: null
mounts:
- location: "~/Documents/dev/opensource/cri-tools"
mountPoint: null
writable: true
sshfs:
cache: null
followSymlinks: null
sftpDriver: null
9p:
securityModel: null
protocolVersion: null
msize: null
cache: "mmap"
- location: "/tmp/lima"
writable: true
mountType: 9p
additionalDisks:
ssh:
localPort: 0
loadDotSSHPubKeys: null
forwardAgent: null
forwardX11: null
forwardX11Trusted: null
caCerts:
removeDefaults: null
files:
certs:
upgradePackages: true
containerd:
system: false
user: false
cpuType:
aarch64: null
armv7l: null
x86_64: null
rosetta:
enabled: null
binfmt: null
timezone: null
firmware:
legacyBIOS: null
audio:
device: null
video:
display: null
vnc:
display: null
networks:
propagateProxyEnv: null
hostResolver:
enabled: null
ipv6: null
hosts:
guestInstallPrefix: null
plain: null
> limactl --version
limactl version 0.20.1
> qemu-system-x86_64 --version
QEMU emulator version 8.2.1
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
Is this issue specific to colima?
Do you see the issue with a plain Lima instance?
Is this issue specific to colima?
Do you see the issue with a plain Lima instance?
This behaviour is specific to lima.
Another related issue https://github.com/abiosoft/colima/issues/999#issuecomment-1987873799
ref https://github.com/lima-vm/lima/issues/2064