go-marathon icon indicating copy to clipboard operation
go-marathon copied to clipboard

[WIP] Add Stop() method to shut down cluster.

Open timoreimann opened this issue 6 years ago • 5 comments

This enables to terminate all pending health checks. Apart from making our tests shut down cleanly, it will also give users an opportunity to stop otherwise potentially leaking health-checking goroutines.

timoreimann avatar Aug 28 '17 19:08 timoreimann

Something is still fishy:

=== RUN   TestUnsubscribe
--- PASS: TestUnsubscribe (0.00s)
=== RUN   TestEventStreamEventsReceived
2017/08/28 19:42:43 httptest.Server blocked in Close after 5 seconds, waiting for connections:
  *net.TCPConn 0xc820032088 127.0.0.1:41900 in state active
SIGQUIT: quit
PC=0x463e41 m=0

goroutine 0 [idle]:
runtime.futex(0xd19788, 0x0, 0x0, 0x0, 0x0, 0xd18d30, 0x0, 0x0, 0x410bca, 0xd19788, ...)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/sys_linux_amd64.s:302 +0x21
runtime.futexsleep(0xd19788, 0x0, 0xffffffffffffffff)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/os1_linux.go:40 +0x53
runtime.notesleep(0xd19788)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/lock_futex.go:145 +0xca
runtime.stopm()
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/proc.go:1535 +0x10b
runtime.findrunnable(0xc820027500, 0x0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/proc.go:1973 +0x808
runtime.schedule()
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/proc.go:2072 +0x2a6
runtime.goexit0(0xc820001800)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/proc.go:2207 +0x1bf
runtime.mcall(0x4577c0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/asm_amd64.s:233 +0x5b

goroutine 1 [chan receive, 9 minutes]:
testing.RunTests(0xb5fa28, 0xd16a20, 0x76, 0x76, 0xd14c01)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/testing/testing.go:583 +0xb39
testing.(*M).Run(0xc820040f10, 0xc8200137f0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/testing/testing.go:515 +0x11e
main.main()
	github.com/gambol99/go-marathon/_test/_testmain.go:288 +0x211

goroutine 17 [syscall, 9 minutes, locked to thread]:
runtime.goexit()
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/asm_amd64.s:1998 +0x1

goroutine 385 [select]:
github.com/gambol99/go-marathon.(*cluster).healthCheckNode(0xc820404000, 0xc8203feca0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:166 +0x33f
github.com/gambol99/go-marathon.(*cluster).markDown.func1(0xc820404000, 0xc8200330b0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:152 +0x4a
created by github.com/gambol99/go-marathon.(*cluster).markDown
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:154 +0x278

goroutine 548 [chan send, 9 minutes]:
github.com/donovanhide/eventsource.(*Server).Handler.func1(0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/gopath/src/github.com/donovanhide/eventsource/server.go:87 +0x44f
github.com/gambol99/go-marathon.authMiddleware.func1(0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/testing_utils_test.go:254 +0x251
net/http.HandlerFunc.ServeHTTP(0xc820409260, 0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:1618 +0x48
net/http.(*ServeMux).ServeHTTP(0xc820409230, 0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:1910 +0x213
net/http.serverHandler.ServeHTTP(0xc82030c680, 0x2b492f53be58, 0xc8202476c0, 0xc8202f87e0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:2081 +0x207
net/http.(*conn).serve(0xc82030c980)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:1472 +0x1566
created by net/http.(*Server).Serve
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/server.go:2137 +0x4d2

goroutine 384 [select]:
github.com/gambol99/go-marathon.(*cluster).healthCheckNode(0xc820404000, 0xc8203fec20)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:166 +0x33f
github.com/gambol99/go-marathon.(*cluster).markDown.func1(0xc820404000, 0xc8200330a8)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:152 +0x4a
created by github.com/gambol99/go-marathon.(*cluster).markDown
	/home/travis/gopath/src/github.com/gambol99/go-marathon/cluster.go:154 +0x278

goroutine 527 [semacquire, 9 minutes]:
sync.runtime_Semacquire(0xc820405d1c)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/runtime/sema.go:47 +0x26
sync.(*WaitGroup).Wait(0xc820405d10)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/sync/waitgroup.go:127 +0x118
net/http/httptest.(*Server).Close(0xc820405ce0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/net/http/httptest/server.go:189 +0x357
github.com/gambol99/go-marathon.(*fakeServer).Close(0xc820405e10)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/testing_utils_test.go:318 +0x6b
github.com/gambol99/go-marathon.(*endpoint).CloseServer(0xc820405e00)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/testing_utils_test.go:327 +0x3b
github.com/gambol99/go-marathon.(*endpoint).Close(0xc820405e00)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/testing_utils_test.go:323 +0x6c
github.com/gambol99/go-marathon.TestEventStreamEventsReceived(0xc8204105a0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/subscription_test.go:359 +0x11bb
testing.tRunner(0xc8204105a0, 0xd172f0)
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/testing/testing.go:473 +0xdd
created by testing.RunTests
	/home/travis/.gimme/versions/go1.6.linux.amd64/src/testing/testing.go:582 +0xae3

goroutine 546 [chan receive]:
github.com/gambol99/go-marathon.(*marathonClient).registerSSESubscription.func1(0xc8204482d0)
	/home/travis/gopath/src/github.com/gambol99/go-marathon/subscription.go:180 +0x1a1
created by github.com/gambol99/go-marathon.(*marathonClient).registerSSESubscription
	/home/travis/gopath/src/github.com/gambol99/go-marathon/subscription.go:187 +0x73
rax    0xca
rbx    0x0
rcx    0x463e43
rdx    0x0
rdi    0xd19788
rsi    0x0
rbp    0x1
rsp    0x7ffe0a7ff850
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0xc82002cf58
r13    0xd19680
r14    0xc820001800
r15    0xd3cd00
rip    0x463e41
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (10m0s).
FAIL	github.com/gambol99/go-marathon	600.030s
make: *** [test] Error 1

timoreimann avatar Aug 28 '17 19:08 timoreimann

Seems like we're yet missing to terminate SSE subscriptions.

timoreimann avatar Aug 28 '17 20:08 timoreimann

Has there been any further work towards supporting cluster (or client) shutdowns? Asking because I'm about ready to start doing this myself since we're running into goroutine leaks in health checks.

We currently schedule Marathon on arbitrary hosts, so it's possible that a member of the cluster will go away and reappear with a new host and port, which currently requires a new client to be allocated. So, unless I can reconfigure a client's cluster (doesn't seem like the right approach), I'd have to be able to stop the client to avoid leaking health check and other goroutines.

nilium avatar Sep 23 '19 16:09 nilium

@nilium I stopped using Marathon and subsequently go-marathon a while ago. You're more than welcome to continue the effort. I can try my best to review any PR(s) you may submit.

Thank you!

timoreimann avatar Sep 23 '19 22:09 timoreimann

@timoreimann Thanks! I've got some changes that I'm testing out in our systems, so if all goes well, I'll have a pull request once I've had some time to make sure that it's working the way I expect. Just testing it on our own work to avoid too much churn in the PR. No ETA on that right now, but the WIP branch is over at https://github.com/Kochava/go-marathon/tree/cluster-stop for anyone else that wants to poke at it. (May be force pushed over, so please don't pin anything to that fork or branch even if it works for someone.)

nilium avatar Sep 23 '19 23:09 nilium