mole
mole copied to clipboard
Goroutine leak in tests
Version: master
Steps to Reproduce
- Checkout master.
- Run
$ go test -race -count=100 ./... - Observe goroutine leak

... scrolls up for a loong time. Interestingly removing the go qualifier from https://github.com/davrodpin/mole/blob/master/tunnel/tunnel.go#L254-L255 just causes the test to hang indefinitely.
What is the purpose of these two lines? The connection objects are being simultaneously written to each other. How is this safe?
https://github.com/davrodpin/mole/blob/a7edeac72db3e2adfb4731e6adc3fe40c6e9f3b3/tunnel/tunnel.go#L254-L255
https://github.com/davrodpin/mole/blob/a7edeac72db3e2adfb4731e6adc3fe40c6e9f3b3/tunnel/tunnel.go#L334-L339
@awnumar, what is the Go version you're using?
I ran the test with -race- and -count 100 and got no errors:
$ for x in $(seq 1 10);do go test -race -count=100 ./... || break; done
ok github.com/davrodpin/mole/cli 1.257s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.363s
ok github.com/davrodpin/mole/tunnel 11.749s
ok github.com/davrodpin/mole/cli 1.270s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.296s
ok github.com/davrodpin/mole/tunnel 12.088s
ok github.com/davrodpin/mole/cli 1.255s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.327s
ok github.com/davrodpin/mole/tunnel 12.302s
ok github.com/davrodpin/mole/cli 1.248s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.257s
ok github.com/davrodpin/mole/tunnel 12.243s
ok github.com/davrodpin/mole/cli 1.245s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.260s
ok github.com/davrodpin/mole/tunnel 12.256s
ok github.com/davrodpin/mole/cli 1.225s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 1.976s
ok github.com/davrodpin/mole/tunnel 11.905s
ok github.com/davrodpin/mole/cli 1.251s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.372s
ok github.com/davrodpin/mole/tunnel 12.161s
ok github.com/davrodpin/mole/cli 1.250s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.301s
ok github.com/davrodpin/mole/tunnel 12.091s
ok github.com/davrodpin/mole/cli 1.447s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.777s
ok github.com/davrodpin/mole/tunnel 13.391s
ok github.com/davrodpin/mole/cli 1.321s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.697s
ok github.com/davrodpin/mole/tunnel 12.364s
What is the purpose of these two lines? The connection objects are being simultaneously written to each other. How is this safe?
The purpose is to create a bidirectional channel between the client and the remote endpoint by keep sending the data from the client to the ssh server to the remote endpoint and vice-versa.
Since io.Copy will only stop copying if either the source or the destination EOF, I can keep two goroutines, one for each direction.
client <-- channel.conn --> ssh server <-- remoteConn --> endpoint
I don't see why it wouldn't be safe but please let me know your thoughts.
:: go version go1.12.6 linux/amd64
Could you run $ ulimit -l on your system and see what you get?
Since io.Copy will only stop copying if either the source or the destination EOF, I can keep two goroutines, one for each direction.
Ah, I understand. It's safe because what you write to X is not the same data you read from X. My local tests are still playing up but I'm still investigating so I'll let you know.
Could you run
$ ulimit -lon your system and see what you get?
Same result with go 1.12.6 and ulimit -l.
I will try on a linux machine later today.
$ go version
go version go1.12.6 darwin/amd64
$ ulimit -l
unlimited
$ go test -race -count=100 ./...
ok github.com/davrodpin/mole/cli 1.304s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.545s
ok github.com/davrodpin/mole/tunnel 11.374s
$
@awnumar, I couldn't reproduce the issue above on linux (see below).
But I find issues with two tests, TestHandlePassword and TestUpdatePassphrase, from tunnel/key_test.go when the -count=100 flag is passed (see bellow)
All tests but TestHandlePassword and TestUpdatePassphrase
$ uname -a
Linux xyz 4.4.0-151-generic #178-Ubuntu SMP Tue Jun 11 08:30:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ ulimit -l
64
$ go version
go version go1.12.6 linux/amd64
$ go test -race -count=100 ./...
ok github.com/davrodpin/mole/cli 1.424s
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 2.326s
ok github.com/davrodpin/mole/tunnel 21.818s
TestHandlePassword (similar result from TestUpdatePassphrase execution)
$ go test -race -count=100 ./... -run TestHandlePassword
ok github.com/davrodpin/mole/cli 1.053s [no tests to run]
? github.com/davrodpin/mole/cmd/mole [no test files]
ok github.com/davrodpin/mole/storage 1.061s [no tests to run]
--- FAIL: TestHandlePassword (0.00s)
panic: <memguard::memcall::Lock> could not acquire lock on 0x7fbd9ab30000, limit reached? [Err: cannot allocate memory] [recovered]
panic: <memguard::memcall::Lock> could not acquire lock on 0x7fbd9ab30000, limit reached? [Err: cannot allocate memory]
goroutine 39 [running]:
testing.tRunner.func1(0xc0002d4500)
/users/mole/go/src/testing/testing.go:830 +0x69d
panic(0x951ea0, 0xc0002be540)
/users/mole/go/src/runtime/panic.go:522 +0x1b5
github.com/awnumar/memguard/core.Panic(0x951ea0, 0xc0002be540)
/users/mole/go/pkg/mod/github.com/awnumar/[email protected]/core/exit.go:73 +0x28f
github.com/awnumar/memguard/core.NewBuffer(0x4, 0x8, 0xc0002c02a8, 0xc0001aac98)
/users/mole/go/pkg/mod/github.com/awnumar/[email protected]/core/buffer.go:75 +0x776
github.com/awnumar/memguard.NewBuffer(0x4, 0xc0002c02a8)
/users/mole/go/pkg/mod/github.com/awnumar/[email protected]/buffer.go:44 +0x3d
github.com/awnumar/memguard.NewBufferFromBytes(0xc0002c02a8, 0x4, 0x8, 0x4)
/users/mole/go/pkg/mod/github.com/awnumar/[email protected]/buffer.go:60 +0x3d
github.com/davrodpin/mole/tunnel.(*PemKey).updatePassphrase(0xc000303eb0, 0xc0002c02a8, 0x4, 0x8)
/users/mole/mole/tunnel/key.go:116 +0xc1
github.com/davrodpin/mole/tunnel.(*PemKey).HandlePassphrase(0xc0001aaeb0, 0xc0001aae80, 0xc0002f3800, 0x9f2)
/users/mole/mole/tunnel/key.go:101 +0x327
github.com/davrodpin/mole/tunnel.TestHandlePassword(0xc0002d4500)
/users/mole/mole/tunnel/key_test.go:75 +0x1a9
testing.tRunner(0xc0002d4500, 0x9e12b8)
/users/mole/go/src/testing/testing.go:865 +0x164
created by testing.(*T).Run
/users/mole/go/src/testing/testing.go:916 +0x65b
FAIL github.com/davrodpin/mole/tunnel 0.079s
@davrodpin I also caught this issue. I'm in the process of designing a patch for it.
LockedBuffer objects lock pages into memory, preventing the kernel from swapping them out to disk. On some systems there is a per-process limit. On yours it seems to be 64 kilobytes.
My idea is to load the pem key from file and decrypt it with the passphrase into the decrypted form immediately, and store the raw decrypted key in a LockedBuffer object which can then be sealed into an Enclave object. It's possible to store as many Enclave objects as you like.
The other solution is to destroy PemKey objects that are no longer needed but this is in my opinion less clean and could result in leaks still.
This is probably unrelated to the Goroutine leak issue however so perhaps open a new issue to track it?
Are these PemKey objects used only to generate ssh.Signers?
My idea is to load the pem key from file and decrypt it with the passphrase into the decrypted form immediately, and store the raw decrypted key in a LockedBuffer object which can then be sealed into an Enclave object. It's possible to store as many Enclave objects as you like.
Not sure how effective would be to keep the key safe since it will be copied to the signer object and we won't be able to protect that part of the memory. To keep all secure, we might need to keep the passphrase in a way it could be reused (I am currently working on a feature that will allow tunnel reconnection) then resolve and load the pemkey and create the signer object in one shot. Then we could keep the signer object sealed until it is used and then destroy it.
@davrodpin Yeah thought that might be the case. I'm looking into trying to initialise the Signer object inside a guarded memory region but for now all I've done is store the decrypted key in an Enclave object.
It's not yet finished and is broken with a segmentation fault somewhere.
Tree: https://github.com/awnumar/mole/tree/enclave Commit: https://github.com/awnumar/mole/commit/aac96dc6a3f476c2228c4cc20cf1dc7928842e4f