CPU 100% in `asyncProducer.newBrokerProducer.func2`
Version:
| Sarama | Kafka | Go |
|---|---|---|
| 1.33.0 | 0.10.2.1 | 1.16.3 |
Problem
CPU 100% when closing broker, here is CPU profiling:
(pprof) top100
Showing nodes accounting for 150.58s, 99.69% of 151.05s total
Dropped 3 nodes (cum <= 0.76s)
flat flat% sum% cum cum%
64.28s 42.56% 42.56% 144.92s 95.94% runtime.selectgo
23.17s 15.34% 57.89% 23.17s 15.34% runtime.lock2
20.25s 13.41% 71.30% 20.25s 13.41% runtime.unlock2
14.32s 9.48% 80.78% 37.83s 25.04% runtime.sellock
13.52s 8.95% 89.73% 33.78s 22.36% runtime.selunlock
4.37s 2.89% 92.62% 151.05s 100% github.com/Shopify/sarama.(*asyncProducer).newBrokerProducer.func2
3.87s 2.56% 95.19% 3.87s 2.56% runtime.fastrand (inline)
1.91s 1.26% 96.45% 5.78s 3.83% runtime.fastrandn
1.69s 1.12% 97.57% 1.69s 1.12% github.com/eapache/queue.(*Queue).Peek
1.69s 1.12% 98.69% 2.85s 1.89% runtime.typedmemclr
1.16s 0.77% 99.46% 1.16s 0.77% runtime.memclrNoHeapPointers
0.24s 0.16% 99.62% 23.41s 15.50% runtime.lockWithRank (inline)
0.10s 0.066% 99.68% 23.51s 15.56% runtime.lock
0.01s 0.0066% 99.69% 20.26s 13.41% runtime.unlock (inline)
0 0% 99.69% 151.05s 100% github.com/Shopify/sarama.withRecover
0 0% 99.69% 20.25s 13.41% runtime.unlockWithRank (inline)
(pprof)
(pprof) list newBrokerProducer.func2
Total: 151.05s
ROUTINE ======================== github.com/Shopify/sarama.(*asyncProducer).newBrokerProducer.func2 in /data/root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go
4.37s 151.05s (flat, cum) 100% of Total
. . 738: // In order to avoid a deadlock when closing the broker on network or malformed response error
. . 739: // we use an intermediate channel to buffer and send pending responses in order
. . 740: // This is because the AsyncProduce callback inside the bridge is invoked from the broker
. . 741: // responseReceiver goroutine and closing the broker requires such goroutine to be finished
. . 742: go withRecover(func() {
. 60ms 743: buf := queue.New()
500ms 500ms 744: for {
80ms 90ms 745: if buf.Length() == 0 {
. . 746: res, ok := <-pending
. . 747: if !ok {
. . 748: // We are done forwarding the last pending response
. . 749: close(responses)
. . 750: return
. . 751: }
. . 752: buf.Add(res)
. . 753: }
. . 754: // Send the head pending response or buffer another one
. . 755: // so that we never block the callback
550ms 2.24s 756: headRes := buf.Peek().(*brokerProducerResponse)
2.03s 146.95s 757: select {
740ms 740ms 758: case res, ok := <-pending:
70ms 70ms 759: if !ok {
. . 760: continue
. . 761: }
. . 762: buf.Add(res)
. . 763: continue
400ms 400ms 764: case responses <- headRes:
. . 765: buf.Remove()
. . 766: continue
. . 767: }
. . 768: }
. . 769: })
Reference
- code: https://github.com/Shopify/sarama/blob/23c42869222b1467841a9b6318a7ceb159c6a6fd/async_producer.go#L757
- PR: #2133
@slaunay hi~ can you please check this issue?
@MoilkNepho did you resolve this issue with newer versions or is it something you still see occurring?
I was doing some testing of Sarama v1.38.1 today, trying to debug a deadlock in the AsyncProducer code that happens during broker rolling restart (ever since the v1.31.x release, unsure if it's in our code or Sarama for now), and I noticed the exact same behaviour from time to time: 100% CPU usage in that function.
So I would say it's still occurring yes.
Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.