kafka-go
kafka-go copied to clipboard
Stuck Write Message when there is a issue "multiple Read calls return no data or error"
I have a strange issue Reproduce Steps:
- I call w.WriteMessages(ctx, msgs...)
- I have an error "multiple Read calls return no data or error"
- I call w.WriteMessages(ctx, msgs...) and it is stuck here
Here is my code:
kafka.NewWriter(kafka.WriterConfig{
Brokers: brokers,
Topic: topic,
Balancer: &kafka.Hash{},
Dialer: &kafka.Dialer{
Timeout: 10 * time.Second,
DualStack: true,
TLS: tlsConfig,
},
})
Sometimes I see this error on Kafka:
[2020-04-09 14:05:40,353] WARN Failed to send SSL Close message (org.apache.kafka.common.network.SslTransportLayer)
java.io.IOException: Broken pipe
at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:79)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:466)
at org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:210)
at org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:173)
at org.apache.kafka.common.utils.Utils.closeAll(Utils.java:718)
at org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:61)
at org.apache.kafka.common.network.Selector.doClose(Selector.java:748)
at org.apache.kafka.common.network.Selector.close(Selector.java:736)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:534)
at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
at kafka.network.Processor.poll(SocketServer.scala:665)
at kafka.network.Processor.run(SocketServer.scala:582)
at java.base/java.lang.Thread.run(Thread.java:834)
Do you have any ideas?
I think it relate to use SSL and send a lot of messages at the same time. It works fine when I only push 5 messages with the method "w.WriteMessages"
What version are you on?
I ran into a recent problem where any error on writing caused the driver to retry all messages. However the driver also wrote all errors to a dedicated error channel. Since the channel was read by the main routine, the error was immediately returned, however the go routine retrying the write request on the next error was stuck waiting on the channel to be read (which would never happen, because the reading routine had returned). This caused the writing to hang forever.
The same thing just happened to me, I believe. I had a network fault where the leader data for Kafka was stale. Then I saw this error, and my writer thread seemed to hang forever.
2020-08-25 14:47:37.713482500 ERROR: writer.go:745: error writing messages to TopicRedacted (partition 0): [6] Not Leader For Partition: the client attempted to send messages to a replica that is not the leader for some partition, the client's metadata are likely out of date
2020-08-25 14:47:37.814596500 ERROR: writer.go:745: error writing messages to TopicRedacted (partition 0): multiple Read calls return no data or error
2020-08-25 14:47:38.214899500 ERROR: writer.go:745: error writing messages to TopicRedacted (partition 0): multiple Read calls return no data or error
Presumably I've encountered the same error as you have. I haven't been able to dig down into it just yet, but it looks like the same problem.
Edit: It was the same problem. Got fixed by updating.
Hello, thank you all for reporting.
Would you be able to mention which version of kafka-go you're experiencing the issue with?
I also encountered this problem. The version of kafka-go is 0.4.17, the version of kafka is 2.13-2.8.0. I triggered the problem when writing plenty of data. My kafka doesn't config TLS.
@pandaadb do you have more details to share about this issue? Is the channel you are referring to managed within kafka-go?
I ran into a recent problem where any error on writing caused the driver to retry all messages. However the driver also wrote all errors to a dedicated error channel. Since the channel was read by the main routine, the error was immediately returned, however the go routine retrying the write request on the next error was stuck waiting on the channel to be read (which would never happen, because the reading routine had returned). This caused the writing to hang forever.
Hard to say if it is the same issue, but I get multiple Read calls return no data or error from batch.Close when closing a kafka.Batch with unread messages.
This is a recurring issue for me, though I am not able to reproduce it manually. It seems to always be the came cadence. Starts out with:
2022-01-29 16:47:30.393884500 ERROR: writer.go:745: error writing messages to TOPIC (partition N): [6] Not Leader For Partition: the client attempted to send messages to a replica that is not the leader for some partition, the client's metadata are likely out of date
Followed by:
2022-01-29 16:47:30.496844500 ERROR: writer.go:745: error writing messages to TOPIC (partition N): multiple Read calls return no data or error
2022-01-29 16:47:30.897083500 ERROR: writer.go:745: error writing messages to TOPIC (partition N): multiple Read calls return no data or error
There seems to have been a network fault and one of the zookeeper nodes lost connectivity to the cluster, to reconnect seconds later. This triggered a re-election and for some reason the client seems to be unaware of it and tries to write to a follower. Upon failing, it just gives up.
When I try to recreate this, the client always seems to just be aware of all of this and just recover. I can't figure out what's different.
Am I missing some obvious thing where I should be performing some manual action?
I see I wrote above (August 25th) that it got fixed by updating, but apparently I just wasn't able to recreate it.
Network connectivity issues can manifest in complex ways, I think the work in https://github.com/segmentio/kafka-go/pull/846 will help better surface the conditions that triggered those errors.
i also got this same issue- [6] Not Leader For Partition: the client attempted to send messages to a replica that is not the leader for some partition, the client's metadata are likely out of date },
github.com/segmentio/kafka-go v0.4.38
There is no load in the kafka consumer OR publisher as such.