sdk-go icon indicating copy to clipboard operation
sdk-go copied to clipboard

Panic on Kafka IP change

Open canni opened this issue 11 months ago • 7 comments

We're experiencing a panic due to send on a closed channel here: https://github.com/cloudevents/sdk-go/blob/release-v2.15.2/protocol/kafka_sarama/v2/receiver.go#L80

I've traced this to the fact that the incoming channel is closed in SDK managed goroutine here: https://github.com/cloudevents/sdk-go/blob/release-v2.15.2/protocol/kafka_sarama/v2/receiver.go#L182

But the ConsumeClaim is running in sarama's internal goroutine, and it's session context is unrelated. This creates situaction where closing the receiver prematurely closes the incoming channel used by ConsumeClaim

This panic is crashing the whole go process, as wrapping StartReceiver with recover() does not cover that goroutine.

canni avatar Mar 22 '24 10:03 canni

Hi @canni thx for filing: quick question -> are you running Confluent Kafka or another environment? We have a PR ready for Confluent Kafka support where we tried to make sure to avoid races when closing. Need to look closer into the sarama implementation or do you want to file a PR?

embano1 avatar Mar 22 '24 13:03 embano1

@yanmxa FYI, that's why I was so picky during the PR review :)

embano1 avatar Mar 22 '24 13:03 embano1

Yes this is when we connect to cluster using confluent docker containers

canni avatar Mar 22 '24 14:03 canni

Would you be open to switch and try our upcoming kafka_confluent binding?

embano1 avatar Mar 24 '24 19:03 embano1

I don't know if we can find time to get around to testing that before Easter break, I'll report after the trial run

canni avatar Mar 26 '24 09:03 canni

No worries, #988 merge still pending (though close). Then we need to cut a new release so you can easily switch. Appreciate the support.

embano1 avatar Mar 26 '24 10:03 embano1

@canni https://github.com/cloudevents/sdk-go/pull/988 is merged

embano1 avatar Apr 14 '24 13:04 embano1