firefly-ethconnect icon indicating copy to clipboard operation
firefly-ethconnect copied to clipboard

TestSingleMessageWithReply: Intermittent failure "send on closed channel"

Open peterbroadhurst opened this issue 4 years ago • 2 comments

Intermittent build failure. Occurred in Travis for #65

time="2020-04-20T03:45:18Z" level=debug msg="Kafka consumer loop started"
time="2020-04-20T03:45:18Z" level=info msg="Sent message: {\"headers\":{\"type\":\"TestSingleMessageWithReply\",\"account\":\"0xAA983AD2a0e0eD8ac639277F37be42F2A5d2618c\",\"ctx\":{\"some\":\"data\"}}}"
time="2020-04-20T03:45:18Z" level=info msg="Kafka consumer received message: Partition=5 Offset=500"
time="2020-04-20T03:45:18Z" level=info msg="Message now in-flight: MsgContext[: reqOffset=in-topic:5:500 complete=false received=2020-04-20T03:45:18.473157259Z]"
time="2020-04-20T03:45:18Z" level=debug msg="Kafka producer successes loop started"
time="2020-04-20T03:45:18Z" level=info msg="Dispatched message context to processor: MsgContext[TestSingleMessageWithReply:9b5a99c5-c049-47e8-4f1f-ebc37184088f reqOffset=in-topic:5:500 complete=false received=2020-04-20T03:45:18.473157259Z]"
time="2020-04-20T03:45:18Z" level=info msg="Sending reply: MsgContext[TestSingleMessageWithReply:9b5a99c5-c049-47e8-4f1f-ebc37184088f reqOffset=in-topic:5:500 complete=false received=2020-04-20T03:45:18.473157259Z replied=2020-04-20T03:45:18.473294814Z replyType=TestReply]"
time="2020-04-20T03:45:18Z" level=info msg="Reply sent: MsgContext[TestSingleMessageWithReply:9b5a99c5-c049-47e8-4f1f-ebc37184088f reqOffset=in-topic:5:500 complete=false received=2020-04-20T03:45:18.473157259Z replied=2020-04-20T03:45:18.473294814Z replyType=TestReply]"
time="2020-04-20T03:45:18Z" level=debug msg="Ready=5:500 CanMark=true Infight=1 InflightSamePartition=1 ReadyToAck=1"
time="2020-04-20T03:45:18Z" level=info msg="Marking offset 500:5"
time="2020-04-20T03:45:18Z" level=info msg="Kafka consumer received message: Partition=5 Offset=500"
time="2020-04-20T03:45:18Z" level=info msg="Message now in-flight: MsgContext[: reqOffset=in-topic:5:500 complete=false received=2020-04-20T03:45:18.473496879Z]"
time="2020-04-20T03:45:18Z" level=error msg="Unauthorized: badness - Message={Headers:{CommonHeaders:{ID: MsgType:TestSingleMessageWithReply Account:0xAA983AD2a0e0eD8ac639277F37be42F2A5d2618c Context:map[some:data]}}}"
time="2020-04-20T03:45:18Z" level=info msg="Sending reply: MsgContext[TestSingleMessageWithReply: reqOffset=in-topic:5:500 complete=false received=2020-04-20T03:45:18.473496879Z replied=2020-04-20T03:45:18.473552483Z replyType=Error]"
panic: send on closed channel
goroutine 38 [running]:
github.com/kaleido-io/ethconnect/internal/kldkafka.(*msgContext).Reply(0xc000108900, 0xcc39a0, 0xc0000d3cc0)
	/home/travis/gopath/src/github.com/kaleido-io/ethconnect/internal/kldkafka/kafkabridge.go:280 +0x52d
github.com/kaleido-io/ethconnect/internal/kldkafka.(*KafkaBridge).ConsumerMessagesLoop(0xc000182fc0, 0xccc020, 0xc0001d27c0, 0xccc060, 0xc0001d6930, 0xc000029950)
	/home/travis/gopath/src/github.com/kaleido-io/ethconnect/internal/kldkafka/kafkabridge.go:348 +0x435
created by github.com/kaleido-io/ethconnect/internal/kldkafka.setupMocks
	/home/travis/gopath/src/github.com/kaleido-io/ethconnect/internal/kldkafka/kafkabridge_test.go:205 +0x24a
FAIL	github.com/kaleido-io/ethconnect/internal/kldkafka	0.048s
?   	github.com/kaleido-io/ethconnect/internal/kldkafka/mock_sarama	[no test files]
ok  	github.com/kaleido-io/ethconnect/internal/kldkvstore	0.019s	coverage: 100.0% of statements
ok  	github.com/kaleido-io/ethconnect/internal/kldmessages	0.032s	coverage: 100.0% of statements
ok  	github.com/kaleido-io/ethconnect/internal/kldopenapi	0.080s	coverage: 100.0% of statements
ok  	github.com/kaleido-io/ethconnect/internal/kldrest	1.721s	coverage: 98.2% of statements
ok  	github.com/kaleido-io/ethconnect/internal/kldtx	1.144s	coverage: 99.4% of statements
ok  	github.com/kaleido-io/ethconnect/internal/kldutils	0.008s	coverage: 98.6% of statements
?   	github.com/kaleido-io/ethconnect/pkg/kldplugins	[no test files]
FAIL

peterbroadhurst avatar Apr 20 '20 13:04 peterbroadhurst

So there's a timing here when the Message already in-flight error message does not come out, and the duplicate actually gets processed - because the in-flight is flushed before the duplicate comes in.

peterbroadhurst avatar Apr 20 '20 13:04 peterbroadhurst

The problem was we were relying on async timing to validate duplicate detection. So in https://github.com/kaleido-io/ethconnect/pull/65 I've added a separate synchronous test for duplicate detection in the Kafka bridge, and pulled it out of the original test that had a separate primary purpose.

peterbroadhurst avatar Apr 20 '20 14:04 peterbroadhurst