talos-vmtoolsd
talos-vmtoolsd copied to clipboard
ARM64 for Vmware Fusion
It's not working get IP from Vmware Fusion guest Talos with talos-vmtoolsd. On host MacOS M1 ARM
This is a tricky one, as the go library we use to interface with ESXi does not seem to have code for arm/aarch64. Bit of background: the backdoor using specific x86/x86_64 instructions (io ports) for communication, which work differently (or not at all) on non-x86.
To double check: if you run VMWare Fusion with a standard linux guest (debian, ubuntu, alpine) and open-vm-tools, do you see the IP in Fusion?
Yes it's got IP from Ubuntu (with open-vm-tools)
So the good news is that open-vm-tools has a RPC/backdoor for ARM64. The bad news is that nor vmw-guestinfo (unfortunately also abandoned/archived) nor avo (the library used by vmw-guestinfo) support ARM at the moment, so there is no usable library for RPC/backdoor on ARM or a place to file an issue about this.
If we were to support ARM, we'd probably have to fork/split/rewrite the backdoor assembly code into a new go library. It is doable, but requires quite a bit of effort, and a test env (none of the people involved in our company have VMWare on ARM).
Let's say we are willing to put in the effort, would you available for testing?
If we do that, we should also update Talos VMWare platform for arm64 (same issue).
Yes I can test it.
We've scheduled the work internally, somebody is going to work on it in the coming few weeks.
Small update: I have extended the project with in-tree code that replaces vmw-guestinfo for i386 and x86_64, and seems to work. I also have arm64 code derived from open-vm-tools that at least compiles. I have access to a arm64 mac with vmware fusion and will start testing/fixing very soon.
If we do that, we should also update Talos VMWare platform for arm64 (same issue).
The plot thickens: although I've replaced the dependency on github.com/vmware/vmw-guestinfo by the new pkg/hypercall (which actually works on ARM64, jay), github.com/vmware/vmw-guestinfo still gets imported by both github.com/siderolabs/talos, and github.com/vmware/govmomi.
I don't fully grasp the go module compilation process, but my understanding is that the code as a whole of every mod and its deps need to compile, which is a problem, because github.com/vmware/vmw-guestinfo does not compile on ARM64. I was hoping only the code that "gets hit" needed to compile (as govc also depends on gvmomi, which depends on vmw-guestinfo, and they do distribute arm64 binaries of that). I probably am missing something, and could use a hint..
EDIT: the dependency on govmomi's toolbox is the culprit.
We need to wait until govmomi/toolbox is on board with this.
So, the govmomi stuff is taking too long, so I copied the structs to another package in the repo. It compiles for linux/arm64, and am going to test this in the coming days.
It seems to work at least partially: my VM shuts down gracefully through Talos API when I click "shutdown". I have not checked the IP reporting (need to enable REST in Fusion first), but suspect that works as well. Still need to take care of a few things:
- The build system produces arm64 OCI images amd64 binary in some cases. I am fighting a bit with my colima/buildx setup, maybe it's just a matter of
make-ing it on a fully set upbuildxenvironment, but maybe I need to dance withkres.yamla bit more. - the ext logs a lot of errors ("no data to receive"), that are not errors at all. I'll fix that in the next days.
- we need to figure out if this change warrants a major version bump. On the one hand, the new backdoor is functionally equivalent. On the other, I can imagine that obscure VMWare platforms (maybe people running 5.0) don't play ball with this new tool. @smira, what do you think?
- Talos does not know if the ext service is healthy. If this is easy to implement, I'll do it, but it does not look like a show stopper for now
$ curl -k -u 'XXX:YYY' https://localhost:8697/api/vms/<ID of Talos VM running vmtoolsd>/ip
{
"ip": "172.16.94.132"
}
You should all be able to test the pre-release: ghcr.io/siderolabs/talos-vmtoolsd:v1.0.0-pr32.0. We'll integrate this test version into our test setup, and if it seems OK, we'll probably release it (together with fixes from other issues) as v1.1.0 soonish.
If we do that, we should also update Talos VMWare platform for arm64 (same issue).
So I took a look into that. For the most it's straightforward, we just need to implement what's called rpcvmx based on nanotoolbox.RPCI (which seems easy).
But, Talos-vmtoolsd uses hierarchical loggers using log/slog, and Talos itself is based on the imperative log.Printf. We could initialize the root log/slog logger to produce log.Printf like messages, but I feel a bit yuck about that. The nicest way forward would be to put pkg/hypercall (and maybe pkg/nanotoolbox?) into it's own module, and make it agnostic about the logging framework. As this quite a bit of work, I think I'd like to postpone this until people are actually going to use OVF/guestinfo provisioning on ARM.
But, Talos-vmtoolsd uses hierarchical loggers using
log/slog, and Talos itself is based on the imperativelog.Printf. We could initialize the rootlog/sloglogger to producelog.Printflike messages, but I feel a bit yuck about that.
This should be straightforward, slog.New(slog.NewTextHandler(log.Writer())) - just use the configured writer to make messages correctly go to kernel log buffer
@kodal, can you check if the updated extension works for you?
Thank you
arch: arm64
platform: nocloud
secureboot: false
version: ${TALOS_VERSION}
customization:
extraKernelArgs:
- net.ifnames=0
input:
baseInstaller:
imageRef: ghcr.io/siderolabs/installer:${TALOS_VERSION}
systemExtensions:
- imageRef: ghcr.io/siderolabs/talos-vmtoolsd:v1.0.0-pr32.0
output:
kind: iso
outFormat: raw
TALOS_VERSION=1.10.3
It stucks as screenshot
with platform: vmware it will provider ova wich i couldn't convert to vmdk on Mac M (ARM) Series
I think the screenshot is way before the extension even runs, can you try the exact same thing without the extension (e.g. same build, same boot) and see this does boot?
BTW: it is a known issue that talos/udevd hangs for ~180s during boot on fusion/arm64/mac (with and without the extension).
After 15-20 minutes booted! Got IP
Super. Thanks for reporting and testing!