mill icon indicating copy to clipboard operation
mill copied to clipboard

Occasional crashes on OS-X

Open lihaoyi opened this issue 7 months ago • 7 comments

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000185319554, pid=43806, tid=64775
#
# JRE version: OpenJDK Runtime Environment Corretto-17.0.6.10.1 (17.0.6+10) (build 17.0.6+10-LTS)
# Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.6.10.1 (17.0.6+10-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64)
# Problematic frame:
# C  [CoreFoundation+0x14f554]  __CFCheckCFInfoPACSignature+0x4

Possibly related to the new native --notify-watch true implementation from os-lib-watch, and seems to come together with errors in the build.mill

lihaoyi avatar May 29 '25 05:05 lihaoyi

Here's the log file

hs_err_pid15791.log

lihaoyi avatar Jun 09 '25 06:06 lihaoyi

Recently I've noticed sometimes --watch on my laptop just ends, with no error message shown. Not sure if it is a different problem or the same problem with a new manifestation

lihaoyi avatar Jun 14 '25 03:06 lihaoyi

Recently I've noticed sometimes --watch on my laptop just ends, with no error message shown. Not sure if it is a different problem or the same problem with a new manifestation

Same, this happens pretty often. I'm on macOS too (ARM)

alexarchambault avatar Jun 14 '25 09:06 alexarchambault

@arturaz is looking into this, but worst case we can revert to polling by default on OSX if we can't figure this out

lihaoyi avatar Jun 14 '25 10:06 lihaoyi

I believe it was due to closed not being volatile: https://github.com/com-lihaoyi/os-lib/blob/638ae3b2c078564d5122aca4b06ed30b03d9f097/os/watch/src/FSEventsWatcher.scala#L14

Then if close() was invoked multiple times, current and dummyCfString was released while pointing to invalid memory because other thread was seeing a stale closed value.

https://github.com/com-lihaoyi/os-lib/blob/638ae3b2c078564d5122aca4b06ed30b03d9f097/os/watch/src/FSEventsWatcher.scala#L100-L109

The new version of os-lib (https://github.com/com-lihaoyi/os-lib/pull/398) should have this fixed.

arturaz avatar Jun 21 '25 18:06 arturaz

@arturaz could you explain why .close may be called multiple times? I thought we use Using to manage the lifetime, which should call close only once

lihaoyi avatar Jun 22 '25 04:06 lihaoyi

To be honest, I have no idea, but this is the closest to a reasonable explanation I could come up. I think we should try it with the new os-lib and see if it still happens. If it doesn't, great, I see no point in trying to figure out why it happened.

arturaz avatar Jun 22 '25 07:06 arturaz

This seems to be still happening on latest 1.0.0-RC3: no stack trace anymore, just the --watch silently exiting. Less common than before perhaps, but it's happened to me at least twice

CC @arturaz

lihaoyi avatar Jul 03 '25 01:07 lihaoyi

You mean the whole JVM process stops? What is the exit code when it does that?

arturaz avatar Jul 03 '25 05:07 arturaz

Exit code 255. The daemon logs contain

8216a823b53b0b7f-pid62763 args ["--jobs=0.5C","-D","mill.main.cli=./mill","-w","website.fastPages"]
8216a823b53b0b7f-pid62763 env {"__CFBundleIdentifier":"com.apple.Terminal","PATH":"/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/Users/lihaoyi/Library/Application Support/JetBrains/Toolbox/scripts","SHELL":"/bin/zsh","TERM_PROGRAM":"Apple_Terminal","TERM":"xterm-256color","SECURITYSESSIONID":"186b1","HOMEBREW_CELLAR":"/opt/homebrew/Cellar","USER":"lihaoyi","HOMEBREW_PREFIX":"/opt/homebrew","TMPDIR":"/var/folders/rt/f1pd6fz92x3__6jg0cmgdd580000gn/T/","LaunchInstanceID":"5C51A843-7EC7-43A5-BC5C-AD4BE06991B8","SSH_AUTH_SOCK":"/private/tmp/com.apple.launchd.cl7cL4HEjT/Listeners","XPC_FLAGS":"0x0","TERM_SESSION_ID":"371FA6CF-F1A1-45D4-999C-61497F627E2A","__CF_USER_TEXT_ENCODING":"0x1F5:0x0:0x64","LOGNAME":"lihaoyi","LC_CTYPE":"UTF-8","HOMEBREW_REPOSITORY":"/opt/homebrew","TERM_PROGRAM_VERSION":"452","XPC_SERVICE_NAME":"0","PWD":"/Users/lihaoyi/Github/mill","SHLVL":"1","INFOPATH":"/opt/homebrew/share/info:","HOME":"/Users/lihaoyi"}
8216a823b53b0b7f-pid62763 Interrupting after 1800000ms
8216a823b53b0b7f-pid62763 server loop ended
8216a823b53b0b7f-pid62763 finally exitServer

So may be unrelated to the crashes we were seeing earlier. I'll open a PR to try and work around this

lihaoyi avatar Jul 03 '25 06:07 lihaoyi

https://github.com/com-lihaoyi/mill/issues/5436

lihaoyi avatar Jul 03 '25 11:07 lihaoyi