talos-vmtoolsd icon indicating copy to clipboard operation
talos-vmtoolsd copied to clipboard

ARM64 for Vmware Fusion

Open kodal opened this issue 8 months ago • 9 comments

It's not working get IP from Vmware Fusion guest Talos with talos-vmtoolsd. On host MacOS M1 ARM

kodal avatar Mar 14 '25 02:03 kodal

This is a tricky one, as the go library we use to interface with ESXi does not seem to have code for arm/aarch64. Bit of background: the backdoor using specific x86/x86_64 instructions (io ports) for communication, which work differently (or not at all) on non-x86.

To double check: if you run VMWare Fusion with a standard linux guest (debian, ubuntu, alpine) and open-vm-tools, do you see the IP in Fusion?

jonkerj avatar Mar 17 '25 13:03 jonkerj

Yes it's got IP from Ubuntu (with open-vm-tools)

Image

kodal avatar Mar 17 '25 17:03 kodal

So the good news is that open-vm-tools has a RPC/backdoor for ARM64. The bad news is that nor vmw-guestinfo (unfortunately also abandoned/archived) nor avo (the library used by vmw-guestinfo) support ARM at the moment, so there is no usable library for RPC/backdoor on ARM or a place to file an issue about this.

If we were to support ARM, we'd probably have to fork/split/rewrite the backdoor assembly code into a new go library. It is doable, but requires quite a bit of effort, and a test env (none of the people involved in our company have VMWare on ARM).

Let's say we are willing to put in the effort, would you available for testing?

jonkerj avatar Mar 18 '25 08:03 jonkerj

If we do that, we should also update Talos VMWare platform for arm64 (same issue).

smira avatar Mar 18 '25 09:03 smira

Yes I can test it.

kodal avatar Mar 18 '25 16:03 kodal

We've scheduled the work internally, somebody is going to work on it in the coming few weeks.

jonkerj avatar Mar 20 '25 11:03 jonkerj

Small update: I have extended the project with in-tree code that replaces vmw-guestinfo for i386 and x86_64, and seems to work. I also have arm64 code derived from open-vm-tools that at least compiles. I have access to a arm64 mac with vmware fusion and will start testing/fixing very soon.

jonkerj avatar Apr 14 '25 11:04 jonkerj

If we do that, we should also update Talos VMWare platform for arm64 (same issue).

The plot thickens: although I've replaced the dependency on github.com/vmware/vmw-guestinfo by the new pkg/hypercall (which actually works on ARM64, jay), github.com/vmware/vmw-guestinfo still gets imported by both github.com/siderolabs/talos, and github.com/vmware/govmomi.

I don't fully grasp the go module compilation process, but my understanding is that the code as a whole of every mod and its deps need to compile, which is a problem, because github.com/vmware/vmw-guestinfo does not compile on ARM64. I was hoping only the code that "gets hit" needed to compile (as govc also depends on gvmomi, which depends on vmw-guestinfo, and they do distribute arm64 binaries of that). I probably am missing something, and could use a hint..

EDIT: the dependency on govmomi's toolbox is the culprit.

jonkerj avatar Apr 14 '25 14:04 jonkerj

We need to wait until govmomi/toolbox is on board with this.

jonkerj avatar Apr 14 '25 18:04 jonkerj

So, the govmomi stuff is taking too long, so I copied the structs to another package in the repo. It compiles for linux/arm64, and am going to test this in the coming days.

jonkerj avatar May 19 '25 14:05 jonkerj

It seems to work at least partially: my VM shuts down gracefully through Talos API when I click "shutdown". I have not checked the IP reporting (need to enable REST in Fusion first), but suspect that works as well. Still need to take care of a few things:

  1. The build system produces arm64 OCI images amd64 binary in some cases. I am fighting a bit with my colima/buildx setup, maybe it's just a matter of make-ing it on a fully set up buildx environment, but maybe I need to dance with kres.yaml a bit more.
  2. the ext logs a lot of errors ("no data to receive"), that are not errors at all. I'll fix that in the next days.
  3. we need to figure out if this change warrants a major version bump. On the one hand, the new backdoor is functionally equivalent. On the other, I can imagine that obscure VMWare platforms (maybe people running 5.0) don't play ball with this new tool. @smira, what do you think?
  4. Talos does not know if the ext service is healthy. If this is easy to implement, I'll do it, but it does not look like a show stopper for now

jonkerj avatar May 20 '25 14:05 jonkerj

$ curl -k -u 'XXX:YYY' https://localhost:8697/api/vms/<ID of Talos VM running vmtoolsd>/ip
{
  "ip": "172.16.94.132"
}

jonkerj avatar May 21 '25 12:05 jonkerj

You should all be able to test the pre-release: ghcr.io/siderolabs/talos-vmtoolsd:v1.0.0-pr32.0. We'll integrate this test version into our test setup, and if it seems OK, we'll probably release it (together with fixes from other issues) as v1.1.0 soonish.

jonkerj avatar May 28 '25 13:05 jonkerj

If we do that, we should also update Talos VMWare platform for arm64 (same issue).

So I took a look into that. For the most it's straightforward, we just need to implement what's called rpcvmx based on nanotoolbox.RPCI (which seems easy).

But, Talos-vmtoolsd uses hierarchical loggers using log/slog, and Talos itself is based on the imperative log.Printf. We could initialize the root log/slog logger to produce log.Printf like messages, but I feel a bit yuck about that. The nicest way forward would be to put pkg/hypercall (and maybe pkg/nanotoolbox?) into it's own module, and make it agnostic about the logging framework. As this quite a bit of work, I think I'd like to postpone this until people are actually going to use OVF/guestinfo provisioning on ARM.

jonkerj avatar Jun 02 '25 14:06 jonkerj

But, Talos-vmtoolsd uses hierarchical loggers using log/slog, and Talos itself is based on the imperative log.Printf. We could initialize the root log/slog logger to produce log.Printf like messages, but I feel a bit yuck about that.

This should be straightforward, slog.New(slog.NewTextHandler(log.Writer())) - just use the configured writer to make messages correctly go to kernel log buffer

smira avatar Jun 02 '25 14:06 smira

@kodal, can you check if the updated extension works for you?

jonkerj avatar Jun 03 '25 15:06 jonkerj

Thank you

arch: arm64
platform: nocloud
secureboot: false
version: ${TALOS_VERSION}
customization:
  extraKernelArgs:
    - net.ifnames=0
input:
  baseInstaller:
    imageRef: ghcr.io/siderolabs/installer:${TALOS_VERSION}
  systemExtensions:
    - imageRef: ghcr.io/siderolabs/talos-vmtoolsd:v1.0.0-pr32.0
output:
  kind: iso
  outFormat: raw

TALOS_VERSION=1.10.3 It stucks as screenshot Image

with platform: vmware it will provider ova wich i couldn't convert to vmdk on Mac M (ARM) Series

kodal avatar Jun 04 '25 21:06 kodal

I think the screenshot is way before the extension even runs, can you try the exact same thing without the extension (e.g. same build, same boot) and see this does boot?

BTW: it is a known issue that talos/udevd hangs for ~180s during boot on fusion/arm64/mac (with and without the extension).

jonkerj avatar Jun 05 '25 06:06 jonkerj

After 15-20 minutes booted! Got IP

kodal avatar Jun 05 '25 08:06 kodal

Super. Thanks for reporting and testing!

jonkerj avatar Jun 05 '25 18:06 jonkerj