snowplow-golang-tracker icon indicating copy to clipboard operation
snowplow-golang-tracker copied to clipboard

Occasional panics occurring in the Emitter Transport layer

Open jbeemster opened this issue 4 years ago • 3 comments

Describe the bug When running the tracker there are occasional panics thrown which cause the whole application to crash. These occur not in the Tracker itself but in the Emitter transport layer below it - however it might be possible to handle this panic within the Emitter to recover gracefully.

To Reproduce This is not easy to reproduce but in a long running application I will see crashes consistently every 1-2 days.

Expected behavior The tracker should not throw a panic performing its core function.

Environment (please complete the following information):

  • ECS Fargate on AWS
  • Golang 1.13.8
  • Container: alpine:3.7

Additional context The stack trace:

1594737074490,fatal error: concurrent map read and map write
1594737074493,goroutine 1385994 [running]:
1594737074493,"runtime.throw(0x10170e9, 0x21)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/panic.go:774 +0x72 fp=0xc00096ea80 sp=0xc00096ea50 pc=0x42dc02
1594737074493,"runtime.mapaccess1(0xe630a0, 0xc000396060, 0xc00096eb60, 0x19844c0)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/map.go:411 +0x269 fp=0xc00096eac8 sp=0xc00096ea80 pc=0x40da49
1594737074493,"net/http.(*Transport).removeIdleConnLocked(0x1957300, 0xc00116e480, 0x8)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:983 +0x1ac fp=0xc00096eba8 sp=0xc00096eac8 pc=0x6bcc4c
1594737074493,"net/http.(*Transport).removeIdleConn(0x1957300, 0xc00116e480, 0xc000080b00)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:973 +0x80 fp=0xc00096ec08 sp=0xc00096eba8 pc=0x6bca50
1594737074493,"net/http.(*persistConn).readLoop.func1(0xc00116e480, 0xc00096ed88)"
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:1880 +0x58 fp=0xc00096ec30 sp=0xc00096ec08 pc=0x6ca168
1594737074493,net/http.(*persistConn).readLoop(0xc00116e480)
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:1917 +0x1148 fp=0xc00096efd8 sp=0xc00096ec30 pc=0x6c37b8
1594737074493,runtime.goexit()
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc00096efe0 sp=0xc00096efd8 pc=0x45b151
1594737074493,created by net/http.(*Transport).dialConn
1594737074493,	/home/travis/.gimme/versions/go1.13.8.linux.amd64/src/net/http/transport.go:1580 +0xb0d

jbeemster avatar Jul 18 '20 19:07 jbeemster

Isolating this down it seems like something in this function is causing the issue: https://github.com/snowplow/snowplow-golang-tracker/blob/master/tracker/emitter.go#L100-L115

Going to see if the issue can be fixed by using a custom transport instead of the default one.

jbeemster avatar Jul 18 '20 19:07 jbeemster

Also we are editing the global transport object in this way which feels like something we should be avoiding - it seems likely that providing a custom transport rather than using the default transport is almost certainly the way to clean this issue up and stop it from occurring.

Example of a default we could copy: https://github.com/golang/go/blob/go1.13.14/src/net/http/transport.go#L42-L54

jbeemster avatar Jul 18 '20 19:07 jbeemster

The issue is line 109 which dereferences the pointer and therefore copies the mutex value.

https://github.com/snowplow/snowplow-golang-tracker/blob/88ca5cf2930840e8c7f0c65efbaef326de5fffd2/tracker/emitter.go#L109

I think it would be fixed by

defaultTransport := http.DefaultTransport.(*http.Transport).Clone()

A workaround is to pass a custom http client to InitEmitter().

apaatsio avatar Feb 06 '24 12:02 apaatsio