colima
colima copied to clipboard
Enabling AVX instructions on x86_64 arch VM with colima on M1 Mac
Describe the Issue
I'd like to enable AVX instructions so that I can use Tensorflow packages direct from Python's PyPi inside the colima VM. Without it, I get this error message and an exception that seems cannot be safely caught:
The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
I'm desperately trying to avoid having to recompile the Tensorflow package without AVX instructions enabled. This will add package management complexity I want to avoid at all costs.
N.b. I'm not trying to actually do high-intensity ML work in this scenario: this is just unit tests and local development of an inference server within a Docker container. The actual deployment environment has Intel Xeon processors under the hood. Tensorflow seems not to even want to start up without AVX instructions available.
Version
Colima Version:
colima version 0.4.2
git commit: f112f336d05926d62eb6134ee3d00f206560493b
runtime: docker
arch: x86_64
client: v20.10.14
server: v20.10.11
Lima Version:
limactl version 0.11.0
Qemu Version
qemu-img version 7.0.0
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
Operating System
- [x] ~~macOS Intel~~
- [X] macOS m1
- [x] ~~Linux~~
$ sw_vers
ProductName: macOS
ProductVersion: 12.3.1
BuildVersion: 21E258
To Reproduce
Steps to reproduce the behavior:
colima start --arch x86_64 --cpu 2 --memory 4 --cpu-type max,+avx,+avx2 --disk 60colima sshgrep avx /proc/cpuinfo- See no output
Expected behavior
I'd expect avx to be in the output of /proc/cpuinfo.
Additional context
I'm 99% sure that AVX is implemented in QEMU but I speculate based on the present behavior that it may not be enableable on an ARM64 Mac.
https://wiki.qemu.org/Internships/ProjectIdeas/AVX - 2019 internship project to implement AVX in qemu https://github.com/andikleen/qemu-avx - dead project that’s 9 years old but was successful, so I’ll bet it was upstreamed
However, qemu lists it:
$ qemu-system-x86_64 -cpu help | grep avx
avic avx avx-vnni avx2 avx512-4fmaps avx512-4vnniw avx512-bf16
avx512-fp16 avx512-vp2intersect avx512-vpopcntdq avx512bitalg avx512bw
avx512cd avx512dq avx512er avx512f avx512ifma avx512pf avx512vbmi
avx512vbmi2 avx512vl avx512vnni bmi1 bmi2 bus-lock-detect cid cldemote
I however found a user comment that asserts:
the fact that qemu lists a bunch of "Available CPUs" that require AVX2 qemu-cpu.txt definitely looks like a bug.
That is, qemu-system-x86_64 -cpu help says that avx is available when it's not available currently on aarch64 host and x86_64 guest.
I've tried the --cpu-type option as:
-
maxSkylake-Clientmax,+avx,+avx2
The running qemu command as of the last option:
qemu-system-x86_64 -m 4096 -cpu max,+avx,+avx2 -machine q35,accel=tcg -smp 2,sockets=1,cores=2,threads=1 -drive if=pflash,format=raw,readonly=on,file=/Users/colin/.colima/_wrapper/share/qemu/edk2-x86_64-code.fd -boot order=d,splash-time=0,menu=on -drive file=/Users/colin/.lima/colima/basedisk,media=cdrom,readonly=on -drive file=/Users/colin/.lima/colima/diffdisk,if=virtio -cdrom /Users/colin/.lima/colima/cidata.iso -netdev user,id=net0,net=192.168.5.0/24,dhcpstart=192.168.5.15,hostfwd=tcp:127.0.0.1:60775-:22 -device virtio-net-pci,netdev=net0,mac=52:55:55:bd:45:24 -device virtio-rng-pci -display none -device virtio-vga -device virtio-keyboard-pci -device virtio-mouse-pci -parallel none -chardev socket,id=char-serial,path=/Users/colin/.lima/colima/serial.sock,server=on,wait=off,logfile=/Users/colin/.lima/colima/serial.log -serial chardev:char-serial -chardev socket,id=char-qmp,path=/Users/colin/.lima/colima/qmp.sock,server=on,wait=off -qmp chardev:char-qmp -name lima-colima -pidfile /Users/colin/.lima/colima/qemu.pid -netdev socket,id=vlan,fd=3 -device virtio-net-pci,netdev=vlan,mac=5a:94:ef:a7:40:5a
I note the presence of -cpu max,+avx,+avx2 as qemu seems to expect.
The ultimate question is this: **How can I enable AVX instructions for a Colima-managed VM?"
Maybe it is a limitation on M1 devices, I am able to see avx and avx2 on my Intel after specifying cpu as host,+avx,+avx2.
It is definitely not available for M1 devices as qemu-system-aarch64 -cpu help | grep avx returns no output.
I also saw avx enabled with kvm64,+avx,+avx2, however anything other than qemu64 cpu type is very slow on M1 devices.
The best performance would've been qemu64,+avx,+avx2 but it is not supported for avx from my test. You can try with kvm64 cpu type but I doubt the speed would be bearable.
Thank you for the incredibly quick response!
I'll try kvm64 and see if something is worse than nothing ;-)
Edit: kvm64 didn't work, reasoning below.
$ colima start --arch x86_64 --cpu 2 --memory 4 --cpu-type kvm64,+avx,+avx2 --disk 60
…
$ colima ssh
colima:~$ grep avx /proc/cpuinfo
colima:~$
I'm learning up on the scene a bit and finding some references to HVF. Apparently, there are some builds of qemu that might enable qemu to use macOS Hypervisor.framework and expose AVX through that if and only if it's actually been implemented.
- https://wiki.qemu.org/Features/HVF
- https://github.com/simnalamburt/homebrew-x#qemu-hvf
- https://github.com/lima-vm/lima/issues/42
HVF may already be there and only relevant to aarch64:
$ qemu-system-aarch64 -accel help
Accelerators supported in QEMU binary:
hvf
tcg
$ qemu-system-x86_64 -accel help
Accelerators supported in QEMU binary:
tcg
Yeah, confirmed. That qemu-hvf fork was upstreamed as of qemu v6.2.0.
qemu upstream issue: qemu x86 TCG doesn't support AVX insns
Ticket Close gap for x86_64-v3 ABI in TCG - CPU support for fma, f16c, avx, avx2 features required pointed me to this mailing list patchset that would implement AVX, it seems. It's not merged yet. The author is hosting their work at https://github.com/pbrook/qemu/tree/avx.
TL;DR As of this comment's timestamp, qemu doesn’t support AVX for aarch64 hosts and x86_64 guests yet, but there’s a patchset that may enable it in development.
If you're finding this issue looking for running Tensorflow inside of a colima/lima/qemu container on an M1 Mac, the short is "you can't" and I'm working on a workaround of some kind.
Some other additional context, mostly geared toward my particular predicament with trying to have Tensorflow start inside of a colima x86_64 container:
- Tensorflow shipping AVX-enabled by default https://github.com/tensorflow/tensorflow/issues/19584
- Tensorflow may be silence-able with the methods available in silence-tensorflow but in practice this seems not to work anymore…?
- This SO Q/A gets into details about AVX and recommends the same approach from the previous bullet
qemu 7.2.0 came out this week and I've started playing with it, starting with getting qemu 7.2.0 into Homebrew. My preliminary results aren't looking great.
I've started qemu via colima with
colima start --arch x86_64 --cpu 2 --cpu-type "qemu64,+sse4.2,+sse4.1,+sse,+sse2,+avx,+avx2" --memory 4
and it starts correctly. However, when I try to run my containers, I'm seeing shell processes exit with 132 or 139 exit codes… indicating an illegal instruction (132 - SIGILL) or segmentation fault (139 - SIGSEGV) when running bash or sh respectively. I haven't yet tried destroying the VM entirely.
@colindean I tried deleting the VM by running colima delete and starting qemu using the same command as you did:
colima start --arch x86_64 --cpu 2 --cpu-type "qemu64,+sse4.2,+sse4.1,+sse,+sse2,+avx,+avx2" --memory 4
And I am getting similar error codes. I have qemu 7.2.0 here as well, with HEAD colima compiled from source using homebrew.
I also tried running with Rosetta 2 and some different parameters:
colima start --cpu 4 --memory 6 --disk 100 --arch amd64 --cpu-type "qemu64,+sse4.2,+sse4.1,+sse,+sse2,+avx,+avx2" --vm-
type=vz --vz-rosetta
But got similar results.
Did you manage to resolve this?
I have not resolved this yet 😞
Install the latest qemu (8.0.0) https://www.qemu.org/, it should work now. Thanks for the great work!
Yes! Going to give 8.0.0 a shot next week and see if I can get Tensorflow working!
Tried with the following command with Qemu 8.0.0:
colima start --arch x86_64 --cpu 4 --cpu-type "max" --memory 8
docker run --rm -it tensorflow/tensorflow:1.7.1 bash
The container started up correctly but, when I run import tensorflow on python, it goes onto a long wait and timeout
Tried with the following command with Qemu 8.0.0:
colima start --arch x86_64 --cpu 4 --cpu-type "max" --memory 8 docker run --rm -it tensorflow/tensorflow:1.7.1 bashThe container started up correctly but, when I run
import tensorflowon python, it goes onto a long wait and timeout
Can confirm Tried different versions of tensorflow and it seems to always hang somewhere here:
python -v
>>> import tensorflow
`import 'numpy.ma' # <_frozen_importlib_external.SourceFileLoader object at 0x7ff84f2f2b20>`
Basically making colima unresponsive
Myself ended up using tensorflow-macos, had high hopes for colima and qemu though.
colima start --arch x86_64 --cpu 8 --cpu-type "max" --memory 16 and tensorflow 2.13.0 works at first glance.
lscpu also shows AVX instructions and tensorflow prints that it will use those in performance criticial operation
However running anything proper is terribly slow. Also qemu process shows only 100% usage , hence apparently using only a single core
Also tried colima start --cpu 8 --memory 16 --vm-type vz --vz-rosetta --mount-type virtiofs and starting docker containers with export DOCKER_DEFAULT_PLATFORM="linux/amd64" .
docker jumps into emulation then, but lscpu reports that the cpu acthitecture is only 32 bit .
Its the same behaviour when enable rosetta on official docker desktop for MacOs
As of now, there isn't really a way to run performant x86_64 docker container on Apple Silicon.