tart Add socket devices and console device thru unix socket or file descriptor support. Aka qemu

In reference: #978

Add socket devices: Sockets
Add console creation thru a unix or file descriptor. Replace #978
Add integration test for sockets and console
Add VSCode support.

Dec 29 '24 14:12 Fred78290

@edigaryev Happy New Year. After some trouble to integrate test in cirrus ci. I propose this pull in replacement of #978.

So I precise that the console device doesn't work on MacOS prior Sequoia.

Jan 08 '25 18:01 Fred78290

Hi @Fred78290 👋

Thank you for your contribution.

It feels to me given the complexity of this approach, it can hardly compete with the basic tools readily available in Tart, such as tart ip and simply using an SSH connection to communicate with the VM.

While this probably does work on paper, I'm not sure if that's something we're comfortable maintaining in Tart in the long run.

In https://github.com/cirruslabs/tart/pull/978#issuecomment-2559812902, you've mentioned that one of the reasons for implementing a separate communication method is that our Cloud Init configuration using datasource_list: [ None ] instead of datasource_list: [ NoCloud, None ].

Have you considered submitting an issue/PR to linux-image-templates to change that instead? It looks like a much simpler change overall.

Jan 13 '25 13:01 edigaryev

Hi @Fred78290 👋

Thank you for your contribution.

It feels to me given the complexity of this approach, it can hardly compete with the basic tools readily available in Tart, such as tart ip and simply using an SSH connection to communicate with the VM.

The vsock channel is the prefered communication mode by most hypervisor, VMWare, LXD, Incus, Lima, Multipass, CloudStack... from host to guest. vsock is available earlier network. As example the vmware-tool use vsock to configure the guest vm (guestinfos properties)

Lima use it implement for port forwarding and allows ssh (22 -> 1022) without configured network (lima-agent).

ssh is evil because you need to install ssh server, create ssh-key or allow password auth. You need network started and when the network won't boot (dhcp failure....) the guest is totaly inaccessible.

While this probably does work on paper, I'm not sure if that's something we're comfortable maintaining in Tart in the long run.

I think it's necessary for linux VM. On MacOS vsock is not implemented in python3, go (I have a waiting to fix it), some networks tool (netcat, socat...)

Try a little demo.

In terminal tart run ubuntu --vsock=fd://0,1:2222 In VM socat VSOCK-CONNECT:2:2222 exec:/bin/bash

In terminal where you launched tart, type any bash command.

You created tart shell

:)

macmini-fboltz-m4:fboltz fboltz$ tart run noble-cloud-image --vsock=fd://0,1:2222 --disk /Users/fboltz/.cake/vms/noble-cloud-image/cloud-init.iso 
2025-01-13 15:20:16.406 tart[61261:917832] +[IMKClient subclass]: chose IMKClient_Modern
2025-01-13 15:20:16.406 tart[61261:917832] +[IMKInputSession subclass]: chose IMKInputSession_Modern
ls
__pycache__
data.txt
echo.py
golang.png
http_server.py
pattern.bin
pushdata.sh
received.txt
recu.txt
virtio
pwd
/home/admin
echo toto
toto
cat /etc/os-release     
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

In #978 (comment), you've mentioned that one of the reasons for implementing a separate communication method is that our Cloud Init configuration using datasource_list: [ None ] instead of datasource_list: [ NoCloud, None ].

Thru. Activing cloud-init need also implement the choosen method to pass vendor-data, user-data and network-config configuration to the guest. (cdrom.iso, http.....)

If you let NoCloud as cloud provider and don't provide cloud data, some chance that the VM take 2 minutes to boot until cloud-init timeout because no network-config provided.

Have you considered submitting an issue/PR to linux-image-templates to change that instead? It looks like a much simpler change overall.

Not yet for the above reason.

Jan 13 '25 14:01 Fred78290

@edigaryev Any progress?

Jan 19 '25 18:01 Fred78290

Hi @Fred78290,

Sorry I'm trying to catch up on all the conversations regarding socket support. Do you have any links how others support it? Do they all have so many options like bind/connect/tcp/udp or focus on one option?

Right now this change looks humongous comparing to what it tries to improve (potentially not 100% stable SSH connection). Historically Tart is a very light wrapper around Virtualization.Framework that's why we pulled Softnet in a separate repo.

It seems if it's enough to just supporting connecting to a guest port, then we can using this method.

Jan 19 '25 20:01 fkorotkov

Hi @fkorotkov

Hi @Fred78290,

Sorry I'm trying to catch up on all the conversations regarding socket support. Do you have any links how others support it? Do they all have so many options like bind/connect/tcp/udp or focus on one option?

As sample

lima agent lxd agent

Also VMWare use vsock to do more than ssh. It's used for cloud-init, vmware-tools. VMWare is the creator of vsock.

A big picture is often better rather than a long talk.

On the picture you can see how it works on Linux host with hypervisor like LXD,Incus and many others using vsock. the exposed api endpoint is linux socket and/or tcp endpoint.

Because vsock doesn't exist on MacOS, you can see on the second part of the picture how to do the same thing. Tart run is just a tunnel between the guest and the host. As is the agent is another project and tart don't know anymore except if it's your agent.

The connect flow is dual depending what we want to do.

1 - The guest listen and the host connect to the endpoint (bind mode) 2 - The host listen and the guest initiate a connect to the host endpoint (connect mode)

Right now this change looks humongous comparing to what it tries to improve (potentially not 100% stable SSH connection). Historically Tart is a very light wrapper around Virtualization.Framework that's why we pulled Softnet in a separate repo.

I have understand but you added small code part to run softnet and control the livecycle of the softnet agent.

It's the same for vsocket device code part, it's just a tunnel between the host and the guest.

If it was possible to communicate with the guest outside of tart, I will doit.

You must understand that is a feature so it doesn't change the comportement of tart but it add great possibility permitting to replace qemu, Multipass or lima-vm and implement his own logical agent.

I make a demo to a friend specialized in AI and the first idea that he have was to make a docker like: tart run myimage ; tart exec myimage -- chatgpt ....

It seems if it's enough to just supporting connecting to a guest port, then we can using this method.

This a part of the job, you must implement communication channel and protocol between the guest and the desirated API.

Jan 20 '25 10:01 Fred78290

@edigaryev @fkorotkov

So?

Feb 02 '25 19:02 Fred78290

Hi @Fred78290,

Sorry for the delay. It was hard to find time to review the changes again and also research what others is doing. Our main concerns with these changes are mainly cost of committing to support this solution by our small team going forward and still the question if that's the right solution for your problem.

To reiterate once again, you main concern with doing SSH is instability (there might be no network, initialization might be slow, OS inside VM might now play well with networking, etc.). That's why we are discussing a way to have bind and connect modes where a program on the host can establish connection with an agent running in the guest, right?

This agent approach requires:

Having an agent with a special protocol.
Custom image or a way to deliver and configure this agent in guest.
Program running on host that does communicate with the agent using a custom protocol.

Have you looked at --serial and --serial-path arguments for tart run as a way to do this communication for your integration? Plus there is an option of the gust agent staring on 0.0.0.0 on boot and then it will be reachable from the host.

PS for ChatGPT integration it seems VNC might be an interesting approach given the multi-modal capabilities and VNC being "screen sharing + pointer + keyboard".

Feb 04 '25 13:02 fkorotkov

Hi @Fred78290,

Sorry for the delay. It was hard to find time to review the changes again and also research what others is doing. Our main concerns with these changes are mainly cost of committing to support this solution by our small team going forward and still the question if that's the right solution for your problem.

I understand. Probably I must explain in detail what I expect to do: Replace Multipass and Lima.

In fact if you look my repositories, I build some projects around Kubernetes and autoscaling at demand with VMWare, AWS, Openstack, CloudStack Multipass...

Developing on Mac and I used Multipass for testing and seeding but since few month at each MacOS release, multipass stop working and I'm tired to wait that Canonical replace Qemu by MacVZ.

So I've decided to replace Multipass by another tool, and after some research I found yours but It miss the feature allowing direct communication between Guest and host without network.

My personnal experience saw that many hypervisor communicate with the guest machine wihtout the network. Multipass use network and the VM use bridged network if the DHCP server fail you can't communicate with the guest and each multipass operation stuck until timeout.

As opposite with VMWare, LXD or Incus, shell communication is done via vsock. It's useful to get ip address without polling like tart does. Also when you have misconfigured network, it's always possible to explore the guest and fix the misconfigured config.

Also when cloud-init stuck, you can open a shell wihtout waiting multi-user.target slice available.

To reiterate once again, you main concern with doing SSH is instability (there might be no network, initialization might be slow, OS inside VM might now play well with networking, etc.). That's why we are discussing a way to have bind and connect modes where a program on the host can establish connection with an agent running in the guest, right?

This agent approach requires:
1. Having an agent with a special protocol.

On the way cakeagent. You have also the ability to use agent from another hypervisor (qemu-guest-agent, incus...)

2. Custom image or a way to deliver and configure this agent in guest.

For Darwin, it's needed to install the agent during the build.

For linux I written a TartHelper creating VM with cloud-init initializer and allow to use native cloud-image as qcow2 format. To configure the VM, I use NoCloud cloud-init provider by attaching a cloud-init.iso containing vendor-data, network-config, user-data and needed files for the agent. Samples below

vendor-data

merge_how:
- name: list
  settings:
  - append
  - recurse_dict
  - recurse_list
- name: dict
  settings:
  - no_replace
  - recurse_dict
  - recurse_list
growpart:
  ignore_growroot_disabled: false
  mode: auto
  devices:
  - /
users:
- name: admin
  lock_passwd: false
  plain_text_passwd: admin
  primary_group: admin
  shell: /bin/bash
  ssh_authorized_keys:
  - ssh-rsa ...
  - ssh-rsa ...
  sudo: ALL=(ALL) NOPASSWD:ALL
manage_etc_hosts: true
ssh_pwauth: true
timezone: Europe/Paris
packages:
- pollinate
write_files:
- path: /usr/local/bin/install-cakeagent.sh
  content: IyEvYmluL3NoCkNJREFUQT0kKGJsa2lkIC1MIENJREFUQSB8fCA6KQppZiBbIC1uICIkQ0lEQVRBIiBdOyB0aGVuCglNT1VOVD0kKG1rdGVtcCAtZCkKCW1vdW50IC1MIENJREFUQSAkTU9VTlQgfHwgZXhpdCAxCgljcCAkTU9VTlQvY2FrZWFnZW50IC91c3IvbG9jYWwvYmluL2Nha2VhZ2VudAoJdW1vdW50ICRNT1VOVAoJY2htb2QgK3ggL3Vzci9sb2NhbC9iaW4vY2FrZWFnZW50CgkvdXNyL2xvY2FsL2Jpbi9jYWtlYWdlbnQgLS1pbnN0YWxsIFwKCQktLWxpc3Rlbj12c29jazovL2FueTo1MDAwIFwKCQktLWNhLWNlcnQ9L2V0Yy9jYWtlYWdlbnQvc3NsL2NhLnBlbSBcCgkJLS10bHMtY2VydD0vZXRjL2Nha2VhZ2VudC9zc2wvc2VydmVyLnBlbSBcCgkJLS10bHMta2V5PS9ldGMvY2FrZWFnZW50L3NzbC9zZXJ2ZXIua2V5CmVsc2UKICBlY2hvICJDSURBVEEgbm90IGZvdW5kIgogIGV4aXQgMQpmaQ==
  encoding: base64
  permissions: '0755'
  owner: root:adm
- path: /etc/cloud/cloud.cfg.d/100_datasources.cfg
  content: 'datasource_list: [ NoCloud, None ]'
  owner: root:adm
- path: /etc/pollinate/add-user-agent
  content: 'caked/vz/1.0 # Written by caked'
  owner: root:adm
- path: /etc/cakeagent/ssl/server.key
  content: ...
  encoding: gzip+base64
  permissions: '0600'
  owner: root:adm
- path: /etc/cakeagent/ssl/server.pem
  content: ...
  encoding: gzip+base64
  permissions: '0600'
  owner: root:adm
- path: /etc/cakeagent/ssl/ca.pem
  content: ...
  encoding: gzip+base64
  permissions: '0600'
  owner: root:adm
runcmd:
- /usr/local/bin/install-cakeagent.sh

/usr/local/bin/install-cakeagent.sh

#!/bin/sh
CIDATA=$(blkid -L CIDATA || :)
if [ -n "$CIDATA" ]; then
	MOUNT=$(mktemp -d)
	mount -L CIDATA $MOUNT || exit 1
	cp $MOUNT/cakeagent /usr/local/bin/cakeagent
	umount $MOUNT
	chmod +x /usr/local/bin/cakeagent
	/usr/local/bin/cakeagent --install \
		--listen=vsock://any:5000 \
		--ca-cert=/etc/cakeagent/ssl/ca.pem \
		--tls-cert=/etc/cakeagent/ssl/server.pem \
		--tls-key=/etc/cakeagent/ssl/server.key
else
  echo "CIDATA not found"
  exit 1
fi

network-config

#cloud-config
network:
  version: 2
  renderer: networkd
  ethernets:
    enp0s1:
      match:
        name: enp0s1
      dhcp4: true
      dhcp-identifier: mac
      addresses:
      - 192.168.75.10/24
      nameservers:
        addresses:
        - 10.0.0.5

3. Program running on host that does communicate with the agent using a custom protocol.

Sure but free to implement that you want. My agent implement 3 apis

VM info (mem, cpu, ip addresses, up time and more)
exec command without ssh
shell directly to the vm without ssh

Have you looked at --serial and --serial-path arguments for tart run as a way to do this communication for your integration? Plus there is an option of the gust agent staring on 0.0.0.0 on boot and then it will be reachable from the host.

It's TTY and buffered (8192 bytes), not working as well in stream mode: larger packet are truncated.

PS for ChatGPT integration it seems VNC might be an interesting approach given the multi-modal capabilities and VNC being "screen sharing + pointer + keyboard".

I'm not specialized in AI, my friend have exploration largest chatgpt but AppleAI it'st not available in virtualized MacOS 15.x and I don't know if neural engine is available in VM.

Feb 04 '25 15:02 Fred78290

Developing on Mac and I used Multipass for testing and seeding but since few month at each MacOS release, multipass stop working and I'm tired to wait that https://github.com/canonical/multipass/issues/3760 So I've decided to replace Multipas...

Great! This now narrows down the use case. We've heard about Multicast recently but have no experience around it. Let us take a look into it and see how can we provide a viable replacement with maybe something like tart exec similar to what multipas has.

Feb 05 '25 15:02 fkorotkov

@fkorotkov @edigaryev So no progress, it's time to close this PR and drop

Feb 26 '25 20:02 Fred78290