valgrind-macos icon indicating copy to clipboard operation
valgrind-macos copied to clipboard

M1/M2 Compatibility

Open Glar35 opened this issue 3 years ago • 51 comments

will there be a version for m1?

Glar35 avatar Nov 20 '22 17:11 Glar35

Hi @gilaroni,

I have started on M1 support, it might take a bit but it's on the roadmap, yes.

LouisBrunner avatar Dec 21 '22 11:12 LouisBrunner

Hello, any news here?

MalwarePup avatar Feb 19 '23 13:02 MalwarePup

Any news here, its always not possible to install valgrind ....

brew install --HEAD LouisBrunner/valgrind/valgrind ==> Fetching louisbrunner/valgrind/valgrind ==> Cloning https://github.com/LouisBrunner/valgrind-macos.git Updating /Users/xxx/Library/Caches/Homebrew/valgrind--git ==> Checking out branch main Already on 'main' Your branch is up to date with 'origin/main'. HEAD is now at ee485f9ab docs: Update README for Homebrew error (#72) ==> Installing valgrind from louisbrunner/valgrind ==> ./autogen.sh ==> ./configure --prefix=/opt/homebrew/Cellar/valgrind/HEAD-ee485f9 --enable-only64bit --build=amd64-darwin ==> make Last 15 lines from /Users/xxx/Library/Logs/Homebrew/valgrind/03.make: fixup_macho_loadcmds.c:465:22: error: use of undeclared identifier 'x86_thread_state64_t' = (x86_thread_state64_t*)(&w32s[2]); ^ fixup_macho_loadcmds.c:467:36: error: no member named '__rsp' in 'struct __darwin_arm_thread_state64'; did you mean '__sp'? init_rsp = state64->__rsp; ^~~~~ __sp /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/include/mach/arm/_structs.h:141:13: note: '__sp' declared here __uint64_t __sp; /* Stack pointer x31 */ ^ 7 errors generated. make[2]: *** [fixup_macho_loadcmds] Error 1 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2

If reporting this issue please do so at (not Homebrew/brew or Homebrew/homebrew-core): https://github.com/louisbrunner/homebrew-valgrind/issues

valgrind's formula was built from an unstable upstream --HEAD. This build failure is expected behaviour. Do not create issues about this on Homebrew's GitHub repositories. Any opened issues will be immediately closed without response. Do not ask for help from Homebrew or its maintainers on social media. You may ask for help in Homebrew's discussions but are unlikely to receive a response. Try to figure out the problem yourself and submit a fix as a pull request. We will review it but may or may not accept it.

eliesaikali avatar Feb 24 '23 14:02 eliesaikali

any update for the M1/M2 compatibility?

TommyJD93 avatar Mar 08 '23 10:03 TommyJD93

ran into the same error as @eliesaikali on my M2 Pro MacBook Pro running macOS Ventura 13.2.1 (22D68).

hacknus avatar Mar 19 '23 11:03 hacknus

+1

whisper-bye avatar Apr 11 '23 01:04 whisper-bye

M1 Pro MacBook Pro running macOS Venture 13.3.1. Same issue as @eliesaikali and @hacknus.

carl-alphonce avatar Apr 19 '23 03:04 carl-alphonce

Any details about the roadmap of Valgrind runs in the M1 arch?

fakecore avatar May 11 '23 04:05 fakecore

any update for the M1/M2 compatibility?

zakariazh avatar Jun 15 '23 17:06 zakariazh

+1

moliqingwa avatar Jun 27 '23 02:06 moliqingwa

Still need help?

JoonasMykkanen avatar Jun 30 '23 20:06 JoonasMykkanen

Would really like Valgrind w/ MacOS Silicon for school purposes. We're using it at school!

brew install --HEAD LouisBrunner/valgrind/valgrind                  6s
Error: Valgrind is currently incompatible with ARM-based Macs, see https://github.com/LouisBrunner/valgrind-macos/issues/56

kalip2 avatar Jul 08 '23 23:07 kalip2

Pleeeassseee bump this up. We're using valgrind in our cs101 class to help with memory leaks in projects related to singly linked lists and pointers. We're also going to need valgrind in our upcoming cs102 class. I don't mind using our college's ubuntu desktops with valgrind, but the labs close at 10pm, and a lot of us do our best studying after 10pm.

kalip2 avatar Jul 14 '23 18:07 kalip2

Speaking from experience, a huge amount of work is needed to get things running smoothly.

paulfloyd avatar Jul 15 '23 06:07 paulfloyd

What are the alternative to valgrind on MacOS what would be usable in a continuous integration environment ?

MartinDelille avatar Jul 15 '23 11:07 MartinDelille

What are the alternative to valgrind on MacOS what would be usable in a continuous integration environment ?

@MartinDelille You can use leaks if you like on MacOS

eliesaikali avatar Jul 15 '23 11:07 eliesaikali

I wasn't aware of this xcode tool! Thanks a lot! 👌

MartinDelille avatar Jul 15 '23 13:07 MartinDelille

Speaking from experience, a huge amount of work is needed to get things running smoothly.

Yes probably, but some people want to contribute like @JoonasMykkanen, but he doesn't get any answer

MalwarePup avatar Jul 15 '23 14:07 MalwarePup

Hi,

Sorry for the lack of update, I have been away for the last month.

As @paulfloyd said, this is not an easy issue to resolve, it's something I have been working on for nearly 3 years (initial push before M1 release in 2020, despair through 2021-2022 as it seemed unfixable, started back up this year) and very actively trying to fix since earlier this year.

You can check my progress on my working branch here: https://github.com/LouisBrunner/valgrind-macos/tree/feature/m1

Here is a breakdown on the different issues to be resolved.

TL;DR: I am still working on it, it's hard work and I don't see it completed before end-of-year, join the Discord server (https://discord.gg/mU9FG3T5jF) if you want to contribute

Compilation: static vs dynamic executables

The first massive hurdle which took me so long to get around is just building Valgrind correctly.

Some context: XNU arm64 has constraints that are massively more restrictive than amd64, I imagine that this is because it is used for iOS which is a much more locked down system (but macOS is going that way too).

Valgrind's tools like memcheck or helgrind are built as static binaries. This is to be able to place Valgrind's code and stack away from the default place where it is normally put in memory. The main reason for that is so there is no conflict when we load the guest binary (e.g. ls), at least from what I gathered from the comments in the code. XNU arm64 doesn't allow static binaries. That is currently a hard requirement of Valgrind that will never be allowed.

So I have been trying a lot of different methods to circumvent that, all of them basically turn around the idea of building Valgrind's tools (either statically or dynamically) then editing the binary to make it closer to what we want. However, this comes with massive drawbacks and issues, most of which I am yet to resolve fully. I have been using a tool called LIEF to do that (https://github.com/lief-project/LIEF) and it works quite well but I am still dependent on an unreleased version (v.1.4.0) as I had to PR a bunch of fixes to be able to do what I want to Valgrind.

Build statically and make dynamic

Fairly straight-forward, check coregrind/make_binary_dynamic_darwin.py for details:

  • make the binary PIE: anything else is forbidden by the kernel
  • add a DYLINKER entry: required in PIE
  • add an empty DySymTab: dyld crashes otherwise
  • add a Main entry: dyld doesn't load your binary otherwise
  • remove the UnixThread entry: can't have both Main and UnixThread

Pros:

  • Still build Valgrind tools statically
  • Not a lot of steps

Cons:

  • Stack is in the wrong place (defined by ASLR instead of us)
  • Have to "reverse-engineer" what a valid dynamic executable look like
  • Random crashes (C-strings not in the right places, I think depends on where the stack gets put)

Build dynamically and adapt to our purposes

Bit more complex, check coregrind/fixup_dynamic_binary_darwin.py for details:

  • Remove a bunch of extra sections and segments we don't need
  • Remove libSystem.B.dylib
  • Move every segment and section we want to keep by a certain offset some Valgrind is not loaded at the default address

Pros:

  • Don't have to "reverse-engineer" a valid dynamic executable

Cons:

  • Stack is in the wrong place (defined by ASLR instead of us)
  • Loads of fixups, dynamic executables have a lot more stuff included
  • Have to build with libSystem.B.dylib (just to remove it later)
  • Crashes (when allocating memories, but only sometimes)

Summary

I basically use a combination of both of those to be able to get further within Valgrind's execution and fix later bug. I don't really see how to make Valgrind stable in that aspect apart from building Valgrind dynamically and somehow find some hacks to address all the issues that come from that.

Code: reimplementing massive parts

Back in 2020, I used the iOS SDK to start building Valgrind without a M1 computer. I also used https://github.com/tyrael9/valgrind-ios as an inspiration to do that (I am actually hoping that Valgrind might be able to run on the latest iOS if M1 support goes through but this is in no way a priority or something I spend any energy on). Most of the code writing happened then and there is a lot.

Valgrind has a lot of assembly, most of which is dependent on both architecture (arm64, amd64, etc) and the OS (Linux, FreeBSD, macOS, etc). This is also true of many C code sections which are very keyed to the exact machine Valgrind will be running on. This means entire part of the program needs to be rewritten for macOS arm64.

Here is a (most likely non-exhaustive) list of outstanding issues related to that:

  • aspacem: I can't find references on how XNU allocates memory and how to divide the memory space between client and Valgrind. It looks like on arm64, macOS will often (always?) allocate memory at very low addresses, which is pretty much the reverse of what amd64 does. Without fixing this, the whole internal memory advisor of Valgrind is completely broken.
  • dispatch/signals: copied it from the Linux arm64 implementation, probably completely wrong
  • mach_traps/syswrap/syscall/pub_tool_machine: adapted from the macOS amd64 implementation, not sure if right
  • sigframe: copied the logic from macOS amd64 and the assembly-specific from Linux arm64, no idea if it works
  • GET_STARTREGS/get_StackTrace_wrk/N_CFI_REGS: assumed to be the same as Linux arm64, no idea if that's correct
  • arm64_darwin_REDIR_FOR_strcat/arm64_darwin_REDIR_FOR_strcpy/arm64_darwin_REDIR_FOR_strlcat/arm64_darwin_REDIR_FOR_strchr/fixup_guest_state_after_syscall_interrupted: unimplemented

The main issue with those assumptions is that I have no way to check them and correct them because I still can't run a program in Valgrind.

Running a program

After I recently got a running version of memcheck, I started trying to get a simple program (e.g. ls) to run. Despite everything building, I still had a lot of issues at runtime, here is a few examples:

  • macOS arm64 doesn't allow read-write-exec memory maps, this is (as many things with Apple) for security reasons: https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon Valgrind makes extensive uses of those RWX mmaps (for similar reasons as well) and that means we need to adapt to those new requirements: we need to manually change if we are allowed to write or exec on each thread, which should be straight-forward but I need to find where to change all those permissions. Moreover, the function to change the permissions is not documented by Apple and we cannot use it due to our whole static executable thing (no system dylib allowed!). So I had to reimplement it in assembly.
  • Signal handling works differently on arm64 that amd64, with sigreturn having to be called differently (super hard to debug)
  • Had to reimplement the longjmp mechanism a bunch of times until I found something that worked
  • machine_get_hwcaps was absolutely not ready for how restrictive Apple is with system registers
  • Had to split __start_in_C_darwin in 2 as the arguments are passed completely differently between arm64 and x86
  • Had to reimplement do_syscall_unix_WRK a few times in arm64

Each of those issues can take a few hours to diagnose, debug and fix, which makes for very slow (and depressing) progress.

Currently, Valgrind will run until it starts to map the client binary (e.g. ls) in memory, which fails for various reasons (which also depend how you compile it static vs dynamic base).

So, where are now?

I am still committed to get Valgrind working on arm64. I might not update frequently because it's very slow progress and I often don't have anything to share apart from "it still doesn't work". However I do work regularly on it, sinking hours and hours trying to get this port to work. I wish I could work on Valgrind full-time or 5 days/week, unfortunately it is not possible as it doesn't pay my bills.

I struggle to see how it could be ready before end of the year (but I do hope we will be much closer to a working version then).

Contributing

I would love for people to contribute, and if you are interested (e.g. @JoonasMykkanen), definitely reach out, I just created a Discord server for ease of communication: https://discord.gg/mU9FG3T5jF

However, Valgrind is a complex program with massive codebase, it does very complicated things without any external libraries. But I would be happy to take some extra time to train people so they can help.

Do understand that I just don't know what is broken and what isn't, porting to arm64 is levels of magnitude above any past fixes I had to do (even the dyld cache support for macOS 11 and later). It will take a lot of time and effort.

LouisBrunner avatar Jul 26 '23 13:07 LouisBrunner

Quick update on arm64 support.

I solved a bunch of issues (fixed memory management, fixed mach_traps, finished and tested the arm64 assembly redirs, added support for FEAT_PAuth to Vex) and Valgrind is now starting to run the guest binary (still crashing in dyld for now).

Overall, that's quite positive and we are getting close to an experimental version that might be able to run basic programs on Apple Silicon.

Here are the top priorities to fix (in descending order):

  • Compilation: still a massive blocker and the next thing to tackle as it now makes progress near impossible
  • Syscalls: arm64 seems to have a different way from amd64 to handle syscall classes which Valgrind doesn't like, the assembly needs reviewing
  • Signals/Fork/Threads (syswrap): not been hit yet and most likely broken
  • Not sure: GET_STARTREGS, get_StackTrace_wrk, N_CFI_REGS, pub_tool_machine

LouisBrunner avatar Sep 28 '23 12:09 LouisBrunner

any update for the M1/M2 compatibility?

lycfr avatar Jan 22 '24 02:01 lycfr

Hi @lycfr,

There still hasn't been any new breakthrough on the compilation issue. So no changes in support so far unfortunately.

LouisBrunner avatar Jan 23 '24 15:01 LouisBrunner

I recently got a Raspberry Pi 5 and I've started work on adding support for FreeBSD arm64.

Initially that went well. It didn't take long to get things to compile and to be able to load guest binaries. However I'm now stuck early on in program startup, getting various crashes in the backend generated code. Not sure yet what is causing this - memory layout, TLS.

Anyway, if I get it to work fairly well and merge it upstream it may help a bit with Darwin arm64.

paulfloyd avatar Feb 22 '24 20:02 paulfloyd

Quick update, I already posted this on Discord, but I found a nice workaround for the compilation issue and managed to load the guest binary quite reliably.

@paulfloyd Same thing for me. I am getting crashes from the is_imm64_to_ireg_EXACTLY4 assertion and random SIGILLs from the "event check" part of the VEX-translated source block. I don't have a clue where either of those issues comes from at the moment. Definitely keep me updated if you fix anything.

LouisBrunner avatar Feb 23 '24 21:02 LouisBrunner

@LouisBrunner OK, very interesting. I'm seeing exactly the same thing.

When the assert happens the guest code is

==11397== at 0x4016BD8: ??? (in /libexec/ld-elf.so.1)

Which is

0x15e38 + 0x1bd8 = 0x17a10

   179e8: aa1803e1      mov     x1, x24
   179ec: f9001118      str     x24, [x8, #0x20]
   179f0: 940030b9      bl      0x23cd4 <_rtld_atfork_post+0x2e98>
   179f4: f9430b57      ldr     x23, [x26, #0x610]
   179f8: 9108f2e9      add     x9, x23, #0x23c
   179fc: f9400128      ldr     x8, [x9]
   17a00: b240010a      orr     x10, x8, #0x1
   17a04: f94417e8      ldr     x8, [sp, #0x828]
   17a08: f900012a      str     x10, [x9]
   17a0c: b40000a8      cbz     x8, 0x17a20 <__tls_get_addr+0xee8>
   17a10: f9400508      ldr     x8, [x8, #0x8]

And the assert in VEX is

#0  chainXDirect_ARM64 (endness_host=VexEndnessLE, place_to_chain=0x10029911e8, 
    disp_cp_chain_me_EXPECTED=0x3817fc1c <vgPlain_disp_run_translations+76>, place_to_jump_to=0x1002991220) at priv/host_arm64_defs.c:6093

Initially I thought that I was getting some address range error and writing into and corrupting the generated code. If you're seeing the same thing that is less likely. I did try Linux arm64 with clang (but using libstdc++ not libc++ - don't think that makes a difference) and I didn't see any errors like this. I'll see if I can use ld.lld.

paulfloyd avatar Feb 24 '24 07:02 paulfloyd

It's difficult to pin down because I have many different scenarios.

Running with vgdb

I get SIGILLs during the early stage of dyld setup (around this area https://github.com/apple-oss-distributions/dyld/blob/d1a0f6869ece370913a3f749617e457f3b4cd7c4/dyld/dyldMain.cpp#L1195). The exact guest RIP or instruction is never the same as the crash happens in the VEX-generated code. It's the evCheck which looks something like this:

->  0x700000fb9f30: ldur   w9, [x21, #0x8]
    0x700000fb9f34: subs   w9, w9, #0x1
    0x700000fb9f38: stur   w9, [x21, #0x8]
    0x700000fb9f3c: b.pl   0x700000fb9f48

IIRC the crash is always on the load.

Running with lldb

I get a crash on the is_imm64_to_ireg_EXACTLY4 assert in chainXDirect_ARM64 like you. I think this is because lldb is able to bypass the SIGILLs (sometimes it reports them, sometimes it silences them and sometimes you can bypass them, not sure how/why).

Running directly

A healthy mix of SIGILLs, asserts and sometimes mmap failure (not sure yet why this is so mercurial). What is so odd to me is that it's so consistent. As I am doing this testing, I only get SIGILLs but yesterday when I tried it was only asserts. This morning it was mmap failures.

Summary

While it's really likely we are encountering the same issue, I also have other problems which might be causing this. Or the evCheck probem and the assertion are related somehow.

LouisBrunner avatar Feb 24 '24 15:02 LouisBrunner

I also get SIGILLs, for instance if I single step using vgdb or --vex-guest-max-insns=1

This isn't code that I know at all well unfortunately.

This looks too similar for it to be a coincidence. I don't see much in the way of platform dependent stuff in any of the files, so it's a bit of a mystery why there is no problem on Linux. I tried adding an "vassert(False)" in chainXDirect_ARM64 on Linux this morning and it triggered straight away, so the problem isn't that macOS and FreeBSD use that function and not Linux.

paulfloyd avatar Feb 24 '24 20:02 paulfloyd

If I understand the code correctly it works as follows.

place_to_chain points to generated code that performs the 4 opcode load of x9 and the blr subroutine call to the address in x9.

(gdb) x /5i place_to_chain
   0x1002991468:        mov     x9, #0x1220                     // #4640
   0x100299146c:        movk    x9, #0x299, lsl #16
   0x1002991470:        movk    x9, #0x10, lsl #32
   0x1002991474:        movk    x9, #0x0, lsl #48
   0x1002991478:        br      x9

Then the assert is checking that the address above corresponds to disp_cp_chain_me_EXPECTED

I think that address is 0x1002991220 which contains

(gdb) x /i 0x1002991220
   0x1002991220:        ldur    w9, [x21, #8]

That address should match disp_cp_chain_me_EXPECTED, but that contains something completely different.

(gdb) p disp_cp_chain_me_EXPECTED
$5 = (const void *) 0x38180094 <vgPlain_disp_run_translations+76>

The function in the assert is a bit hard to follow as it doesn't extract the address and compare the two addresses, it generates opcodes from the address and compares the opcodes.

My thoughts at the moment are that this us a problem of matching chainXDirect_ARM64 and unchainXDirect_ARM64. It looks to me like "chain" is being called on instructions that have been already "chained" once but not "unchained".

paulfloyd avatar Feb 25 '24 08:02 paulfloyd

And there may be something in this bugzilla https://bugs.kde.org/show_bug.cgi?id=412377 since there is some connection between chaining and the icache:

   VexInvalRange vir
       = LibVEX_UnChain( arch_host, endness_host, place_to_patch, 
                         place_to_jump_to_EXPECTED, disp_cp_chain_me );
   VG_(invalidate_icache)( (void*)vir.start, vir.len );

paulfloyd avatar Feb 25 '24 08:02 paulfloyd

Hmm no Linux doesn't seem to use unchainXDirect_ARM64 but every place_to_chain address is unique. That's not the case on FreeBSD.

paulfloyd avatar Feb 25 '24 14:02 paulfloyd