kbfs icon indicating copy to clipboard operation
kbfs copied to clipboard

KBFS hang in kernel then crash write error: error="socket is not connected" stack

Open gabriel opened this issue 8 years ago • 5 comments

@AlexClineBB has a hang and crash here: https://github.com/keybase/client/issues/5864#issuecomment-279758359

Log ID: ddf410f757aedbc080b0931c

KBFS starts up:

2017-02-14T11:18:00.857680 ▶ [INFO kbfs start.go:27] 001 KBFS version 1.0.2-20170209165734+fde7523

Then a crash ~5 minutes later.

2017-02-14T11:23:07.490149 ▶ [DEBU FUSE fuse.go:1114] 14b kernel write error: error="socket is not connected" stack=
goroutine 1020 [running]:
github.com/keybase/kbfs/vendor/bazil.org/fuse.stack(0x53a1260, 0xc4213ccd80)
	/Users/test/go/src/github.com/keybase/kbfs/vendor/bazil.org/fuse/debug.go:9 +0x74
github.com/keybase/kbfs/vendor/bazil.org/fuse.(*Conn).respond(0xc420151740, 0xc420294320, 0xa0, 0xa0)
	/Users/test/go/src/github.com/keybase/kbfs/vendor/bazil.org/fuse/fuse.go:1113 +0x96
github.com/keybase/kbfs/vendor/bazil.org/fuse.(*Header).respond(0xc421396140, 0xc420294320, 0xa0, 0xa0)
	/Users/test/go/src/github.com/keybase/kbfs/vendor/bazil.org/fuse/fuse.go:313 +0x59
github.com/keybase/kbfs/vendor/bazil.org/fuse.(*LookupRequest).Respond(0xc421396140, 0xc4201d43c0)
	/Users/test/go/src/github.com/keybase/kbfs/vendor/bazil.org/fuse/fuse.go:1600 +0x200
github.com/keybase/kbfs/vendor/bazil.org/fuse/fs.(*Server).handleRequest(0xc4200ab2c0, 0x6805650, 0xc4213dc7c0, 0x5398820, 0xc42000e1a0, 0xc420148940, 0x53a4a20, 0xc421396140, 0xc421305eb8, 0xc420449698, ...)
	/Users/test/go/src/github.com/keybase/kbfs/vendor/bazil.org/fuse/fs/
2017-02-14T11:23:07.490196 ▶ [DEBU kbfs main.go:104] 14c Ending

gabriel avatar Feb 15 '17 23:02 gabriel

@gabriel: that's not really a KBFS crash. It looks like something on the device unmounted the FUSE mountpoint or removed the kernel extension, leaving KBFS with nothing to talk to. Maybe we should look into the system logs to figure out if anything happened in the kernel around then?

strib avatar Feb 16 '17 04:02 strib

@strib @gabriel, here are the logs from around that time. I see a lot of entries for 'virusbarrierb' which I believe to be Panda Antivirus, my company's AV program. To debug, I disabled real-time scanning in Panda and the crashing seems to have stopped.

Can you provide a list of Mac directories that I should add to the AV whitelist?

zgrep -E "(Keybase|kbfs|kbfuse)" system.log.1.gz 
Feb 14 10:08:13 alexcline-L141 com.apple.xpc.launchd[1] (keybase.kbfs): This service is defined to be constantly running and is inherently inefficient.
Feb 14 10:08:14 alexcline-L141 kernel[0]: kbfuse: starting (3.5.2, Sep 27 2016, 10:56:03)
Feb 14 10:08:55 alexcline-L141.local WindowServer[208]: disable_update_timeout: UI updates were forcibly disabled by application "Keybase" for over 1.00 seconds. Server has re-enabled them.
Feb 14 10:08:55 alexcline-L141.local WindowServer[208]: common_reenable_update: UI updates were finally reenabled by application "Keybase" after 1.20 seconds (server forcibly re-enabled them after 1.00 seconds)
Feb 14 10:28:00 alexcline-L141.local WindowServer[208]: CGError post_notification(const CGSNotificationType, void *const, const size_t, const bool, const CGSRealTimeDelta, const int, const CGSConnectionID *const, const pid_t): Timed out 0.250 second wait for reply from "Keybase" for synchronous notification type 100 (kCGSDisplayWillReconfigure) (CID 0x15307, PID 448)
Feb 14 11:10:06 alexcline-L141 com.apple.xpc.launchd[1] (keybase.kbfs[646]): Service exited due to signal: Terminated: 15
Feb 14 11:11:09 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout kbfs -> /Applications/Keybase.app/Contents/Resources/ExtendedAttributeFinderInfo.bin
Feb 14 11:12:11 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout kbfs -> /Applications/Keybase.app/Contents/Resources/ExtendedAttributeFinderInfo.bin
Feb 14 11:12:12 alexcline-L141 kernel[0]: process kbfs[2000] caught causing excessive wakeups. Observed wakeups rate (per sec): 537; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 67980
Feb 14 11:14:25 alexcline-L141 com.apple.xpc.launchd[1] (keybase.kbfs): This service is defined to be constantly running and is inherently inefficient.
Feb 14 11:14:28 alexcline-L141 kernel[0]: process Keybase[2105] caught causing excessive wakeups. Observed wakeups rate (per sec): 10918; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 108757
Feb 14 11:14:29 alexcline-L141 kernel[0]: process Keybase[2093] caught causing excessive wakeups. Observed wakeups rate (per sec): 2820; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 47796
Feb 14 11:15:28 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout kbfs -> /Applications/Keybase.app/Contents/Resources/ExtendedAttributeFinderInfo.bin
Feb 14 11:15:28 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Applications/Keybase.app/Contents/Resources/app/images/menubarIcon/[email protected]
Feb 14 11:16:30 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Applications/Keybase.app/Contents/SharedSupport/bin/keybase
Feb 14 11:16:30 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout kbfs -> /Applications/Keybase.app/Contents/Resources/ExtendedAttributeFinderInfo.bin
Feb 14 11:17:00 alexcline-L141.local spindump[500]: Saved wakeups_resource.diag report for Keybase version 1.0.18-20170209165653+17b641d (1.0.18-20170209165653+17b641d) to /Library/Logs/DiagnosticReports/Keybase_2017-02-14-111700_alexcline-L141.wakeups_resource.diag
Feb 14 11:17:32 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /System/Library/ColorSync/Profiles/sRGB Profile.icc
Feb 14 11:17:32 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout kbfs -> /Applications/Keybase.app/Contents/Resources/ExtendedAttributeFinderInfo.bin
Feb 14 11:18:00 alexcline-L141 com.apple.xpc.launchd[1] (keybase.kbfs): This service is defined to be constantly running and is inherently inefficient.
Feb 14 11:18:06 alexcline-L141 kernel[0]: process Keybase[2238] caught causing excessive wakeups. Observed wakeups rate (per sec): 3674; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 45149
Feb 14 11:18:06 alexcline-L141 kernel[0]: process Keybase[2241] caught causing excessive wakeups. Observed wakeups rate (per sec): 24225; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 268060
Feb 14 11:19:03 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout kbfs -> /Applications/Keybase.app/Contents/Resources/ExtendedAttributeFinderInfo.bin
Feb 14 11:19:03 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Applications/Keybase.app/Contents/Resources/app/images/icons/[email protected]
Feb 14 11:19:03 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Users/alex.cline/Library/Application Support/Keybase/Cookies
Feb 14 11:19:03 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Applications/Keybase.app/Contents/Resources/app/images/icons/icon-computer-32.png
Feb 14 11:20:05 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Users/alex.cline/Library/Application Support/Keybase/app-state.json
Feb 14 11:20:05 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout kbfs -> /Applications/Keybase.app/Contents/Resources/ExtendedAttributeFinderInfo.bin
Feb 14 11:20:05 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Applications/Keybase.app/Contents/Resources/app/images/menubarIcon/[email protected]
Feb 14 11:20:37 alexcline-L141.local spindump[500]: Saved wakeups_resource.diag report for Keybase version 1.0.18-20170209165653+17b641d (1.0.18-20170209165653+17b641d) to /Library/Logs/DiagnosticReports/Keybase_2017-02-14-112037_alexcline-L141.wakeups_resource.diag
Feb 14 11:21:07 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Users/alex.cline/Library/Application Support/Keybase/Cookies
Feb 14 11:21:07 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout Keybase -> /Users/alex.cline/Library/Preferences/com.apple.symbolichotkeys.plist
Feb 14 11:21:07 alexcline-L141.local virusbarrierb[211]: Blocking file access after 60 seconds timeout kbfs -> /Applications/Keybase.app/Contents/Resources/ExtendedAttributeFinderInfo.bin
Feb 14 11:21:09 alexcline-L141 kernel[0]: process kbfs[2279] caught causing excessive wakeups. Observed wakeups rate (per sec): 315; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 59311

AlexClineBB avatar Feb 16 '17 15:02 AlexClineBB

(keybase.kbfs): This service is defined to be constantly running and is inherently inefficient.

Ouch, that's harsh.

Can you provide a list of Mac directories that I should add to the AV whitelist?

/keybase is the only mountpoint we add to your system, hopefully you can whitelist that and your problems will go away? If there's a way to whitelist processes too, I guess you should put "keybase.kbfs" (or just "kbfs") on there too.

strib avatar Feb 16 '17 17:02 strib

This service is defined to be constantly running and is inherently inefficient. is shown in log for any launchd that has KeepAlive as true. You can pretty much ignore it.

Yeah, /keybase is the mountpoint and the kbfs process is /Applications/Keybase.app/Contents/SharedSupport/bin/kbfs.

gabriel avatar Feb 16 '17 21:02 gabriel

@strib I wonder if we can fail more gracefully if AV software blocks something?

gabriel avatar Feb 16 '17 21:02 gabriel