Expose usable PMU (perf counters) to guest
I've been trying to get perf stat / perf record to work properly in a Linux aarch64 guest in UTM on my M1 hardware. My use-case is the need to profile software using common performance counters (cycles, instructions, cache misses, etc) while developing and optimizing in the guest.
This probably requires multiple moving parts: qemu support for PMU virtualization/passthrough, UTM setting the right options, guest Linux kernel knowledge of whatever PMU is exposed (is it Apple-specific?), and knowledge of the right counters in the perf userspace tool. This issue is an attempt to work out where each of these parts stands.
Has anyone gotten a perf stat /bin/ls (for exampe) in a Linux guest to report cycles and instructions? If so, are there instructions anywhere?
What I've worked out so far, using UTM 3.1.5 and a guest running a recent kernel (5.18), on macOS 12.4 / M1 as a host:
- If I boot with the "CPU" option set to "host", then no PMU is exposed at all, according to the boot-time
dmesgoutput. - If I boot with the "CPU" option set to cortex-a72 (or various other similar cores), a virtual PMU is exposed according to boot messages, but
perf listshows no hardware events available. (In contrast, on a real A72 on my RPi4, I see events for cycles, instructions, and the like.)
These lead me to suspect that more support may need to be added somewhere for Apple-specific performance counters, but that's just a guess. Any guidance would be welcome -- thanks!