lima icon indicating copy to clipboard operation
lima copied to clipboard

Core dump when using vector instructions inside a lima VM on a Mac M4

Open MishaVeldhoen opened this issue 1 year ago • 6 comments

Description

Context

  • Lima version: 1.0.6
  • Host OS: Sequoia 15.4
  • Host system: Apple M4 Pro, 24 GB
  • Default settings

Description

When inside a lima VM, I try to use the python JAX library, which has a JIT compiler that tries to use the available vector instructions. On the host system, as well as inside an orbstack VM, this works without issue, but inside the lima VM I get a core dump.

Reproduction steps

limactl start
lima

# Setup
sudo apt update
sudo apt install pipx gdb
pipx install uv
pipx ensurepath
source ~/.bashrc
mkdir ~/test && cd ~/test
uv init
uv add jax[cpu]

# Causing the crash
ulimit -c unlimited
source .venv/bin/activate
python3
>>> import jax.numpy as jnp
>>> jnp.arange(2)

# Checking the core dump
gdb python3 core
(gdb) x/10i $pc
=> 0xf2606c4d7008:      index   z0.s, #0, #1
   0xf2606c4d700c:      ldr     x8, [x0, #24]
   0xf2606c4d7010:      mov     x0, xzr
   0xf2606c4d7014:      ldr     x8, [x8]
   0xf2606c4d7018:      str     d0, [x8]
   0xf2606c4d701c:      ldp     x29, x30, [sp], #16
   0xf2606c4d7020:      ret
   0xf2606c4d7024:      udf     #0
   0xf2606c4d7028:      udf     #0
   0xf2606c4d702c:      udf     #0

More information

I tried to do the same thing inside an orbstack VM, and I do not get a core dump.

I tried to do the same thing inside a lima VM on an M3 machine, and I do not get a core dump.

On the M4 system, when I compare lscpu between lima and orbstack, it seems like inside the lima VM the host system's vector instructions are exposed (but cause a core dump when used), while inside the orbstack VM, the vector instructions are not exposed.

Lima:

Architecture:             aarch64
  CPU op-mode(s):         64-bit
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                Apple
  Model name:             -
    Model:                0
    Thread(s) per core:   1
    Core(s) per cluster:  4
    Socket(s):            -
    Cluster(s):           1
    Stepping:             0x0
    BogoMIPS:             48.00
    Flags:                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 flagm2 frint svei8mm svebf16 bf16 afp sme smei16i64 smef64f64 smei8
                          i32 smef16f32 smeb16f32 smef32f32 sme2 smei16i32 smebi32i32
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-3

Orbstack:

$ orbctl run lscpu
Architecture:             aarch64
  CPU op-mode(s):         64-bit
  Byte Order:             Little Endian
CPU(s):                   12
  On-line CPU(s) list:    0-11
Vendor ID:                Apple
  Model name:             -
    Model:                0
    Thread(s) per core:   1
    Core(s) per cluster:  12
    Socket(s):            -
    Cluster(s):           1
    Stepping:             0x0
    CPU(s) scaling MHz:   100%
    CPU max MHz:          2000.0000
    CPU min MHz:          2000.0000
    BogoMIPS:             48.00
    Flags:                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm sb dcpodp flagm2 frint bf16 afp

MishaVeldhoen avatar Apr 09 '25 22:04 MishaVeldhoen

With vz or QEMU?

Specifying the other one in limactl create --vm-type=... may work?

AkihiroSuda avatar Apr 12 '25 12:04 AkihiroSuda

@AkihiroSuda I tried with with VM types:

~ limactl list
NAME       STATUS     SSH                VMTYPE    ARCH       CPUS    MEMORY    DISK      DIR
default    Stopped    127.0.0.1:0        vz        aarch64    4       4GiB      100GiB    ~/.lima/default
qemu       Running    127.0.0.1:49819    qemu      aarch64    4       4GiB      100GiB    ~/.lima/qemu

And in both cases I get a core dump if I use default settings. The output of lscpu posted in the ticket was with vmType=vz, here's what I get with qemu:

Architecture:             aarch64
  CPU op-mode(s):         64-bit
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                ARM
  Model name:             Cortex-A76
    Model:                1
    Thread(s) per core:   1
    Core(s) per socket:   4
    Socket(s):            1
    Stepping:             r4p1
    BogoMIPS:             48.00
    Flags:                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp dit sve2 svei8mm svebf16 smei16i64 smef64f64 smei8i32 smef16f32 smeb16f32 smef32f32 sme2 smei16i32 smebi32i32
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-3

I believe if I can configure QEMU to not use SVE2 instructions, that could workaround the issue. Is there documentation available on the schema of the lima configuration yaml?

MishaVeldhoen avatar Apr 15 '25 14:04 MishaVeldhoen

Is there documentation available on the schema of the lima configuration yaml?

QEMU's -cpu string can be specified in .cpuType

https://github.com/lima-vm/lima/blob/0625d0b084450e874869dcbc9f63d4312797c3fe/templates/centos-stream-10.yaml#L31-L35

AkihiroSuda avatar Apr 15 '25 14:04 AkihiroSuda

For Intel we get a SIGILL the first time that AVX-512 is used on Darwin, and then you have to enable it on the host CPU thread (outside of the VM)

Is there something similar happening here with SVE-2, or what is in the core dump? The solution could be the same, mask some instructions...

  • https://github.com/lima-vm/lima/pull/3065

cpuType[arch] += ",-sve2"

afbjorklund avatar Apr 15 '25 15:04 afbjorklund

Does cpuinfo report that the SVE instructions are available?

go run github.com/klauspost/cpuid/v2/cmd/cpuid@latest

Does SVE2 code work on Darwin, or is it not implemented?

https://developer.arm.com/documentation/102340/0100/Program-with-SVE2

afbjorklund avatar Apr 15 '25 16:04 afbjorklund

Hi @afbjorklund, @AkihiroSuda thanks for helping me looking into this!

Does cpuinfo report that the SVE instructions are available?

According to its wikipedia entry, M4 does not implement SVE: https://en.wikipedia.org/wiki/Apple_M4 Running cpuid gives:

Name: Apple M4 Pro
Vendor String: Apple
Vendor ID: VendorUnknown
PhysicalCores: 12
Threads Per Core: 1
Logical Cores: 12
CPU Family 399882554 Model: 0 Stepping: 0
Features: AESARM,ASIMDDP,ASIMDRDM,ATOMICS,CRC32,DCPOP,FCMA,FHM,FP,FPHP,GPA,JSCVT,LRCPC,PMULL,TS,SHA1,SHA2,SHA3,SHA512
Microarchitecture level: 0
Cacheline bytes: 128
L1 Instruction Cache: 131072 bytes
L1 Data Cache: 65536 bytes
L2 Cache: 4194304 bytes
L3 Cache: -1 bytes

cpuType[arch] += ",-sve2"

I've tried the following configuration:

cpuType:
   aarch64: "cortex-a76,-sve2"

but I think it expects cpu features to be supplied like key=value. Verified by trying to start qemu directly:

> qemu-system-aarch64 -m 4096 -cpu cortex-a76,-sve -machine virt,accel=hvf
qemu-system-aarch64: Expected key=value format, found -sve.
> qemu-system-aarch64 -m 4096 -cpu cortex-a76,sve=off -machine virt,accel=hvf
qemu-system-aarch64: can't apply global cortex-a76-arm-cpu.sve=off: Property 'cortex-a76-arm-cpu.sve' not found

So now it seems like the syntax is right, but the sve feature cannot be set. I tried to figure out what features are available for toggling on the different CPU choices, but it doesn't seem like the QEMU version shipped with lima includes query-cpu-model-expansion (see: https://qemu-project.gitlab.io/qemu/system/arm/cpu-features.html)

If you have any further suggestions how to proceed, I can try a couple more things!

MishaVeldhoen avatar May 02 '25 22:05 MishaVeldhoen

Hi @AkihiroSuda @afbjorklund I'd be happy to continue debugging this, but I'm not sure what other things to try.

MishaVeldhoen avatar Jun 30 '25 18:06 MishaVeldhoen