On macOS Catalina, if lisp kernel binary is removed while running, run-program breaks
On my Catalina system, it appears that run-program breaks if the lisp kernel binary (dx86cl64) is removed while the lisp is running.
This comes up because (rebuild-ccl :full t) uses run-program to do make clean and then make in the lisp kernel build directory.
? (run-program "ls" (list "/bin") :output t)
[ dd launchctl pwd test
...
#<EXTERNAL-PROCESS (ls /bin)[78201] (EXITED : 0) #x3020004A5D8D>
;;; Now go to a shell and remove dx86cl64
? (run-program "ls" (list "/bin") :output t)
#<EXTERNAL-PROCESS (ls /bin)[78226] (SIGNALED : 9) #x3020004B377D>
This surprises me, and I don't understand what's going on here. This has worked for years...
When it comes to rebuilding CCL, this can be worked around by rebuilding the lisp-kernel by hand (by going to the lisp kernel build directory and doing make clean; make). Then, instead of using (rebuild-ccl :full t), use (rebuild-ccl :clean t). The difference is that :full t tries to recompile the lisp kernel as well as rebuilding a lisp image, whereas :clean t only builds a lisp image.
Indeed, issuing a
(rebuild-ccl :clean t)
gets the compile to work. I'm now working out the packaging details on the MacPorts side.
I was looking around, and happened to notice https://github.com/ziglang/zig/pull/7231, which contains the following comment:
When targeting aarch64 from aarch64 macOS, the toolchain now performs a full artefact copy into a temporary filename, and then a rename back into the emitted binary. This is required to make incremental linking work with the latest constraints of the XNU kernel on Apple Silicon. For those lacking context, the latest kernel is actively caching run binaries and any change to the binary/inode that was already cached by the kernel ends up in an immediate SIGKILL. So copy-rename dance is the way to circumvent this.
I wonder if something similar is what is happening to us. We are in fact replacing the dx86cl64 binary when we recompile the lisp kernel.
As of (at least) Xcode 15.1 and macOS Sonoma 14.2.1, (rebuild-ccl :full t) works again.
I don't know what change in the operating system or development tools caused/fixed this.