sarama icon indicating copy to clipboard operation
sarama copied to clipboard

CPU 100% in `asyncProducer.newBrokerProducer.func2`

Open MoilkNepho opened this issue 3 years ago • 1 comments

Version:

Sarama Kafka Go
1.33.0 0.10.2.1 1.16.3

Problem

CPU 100% when closing broker, here is CPU profiling:

(pprof) top100                      
Showing nodes accounting for 150.58s, 99.69% of 151.05s total
Dropped 3 nodes (cum <= 0.76s)
      flat  flat%   sum%        cum   cum%
    64.28s 42.56% 42.56%    144.92s 95.94%  runtime.selectgo
    23.17s 15.34% 57.89%     23.17s 15.34%  runtime.lock2
    20.25s 13.41% 71.30%     20.25s 13.41%  runtime.unlock2
    14.32s  9.48% 80.78%     37.83s 25.04%  runtime.sellock
    13.52s  8.95% 89.73%     33.78s 22.36%  runtime.selunlock
     4.37s  2.89% 92.62%    151.05s   100%  github.com/Shopify/sarama.(*asyncProducer).newBrokerProducer.func2
     3.87s  2.56% 95.19%      3.87s  2.56%  runtime.fastrand (inline)
     1.91s  1.26% 96.45%      5.78s  3.83%  runtime.fastrandn
     1.69s  1.12% 97.57%      1.69s  1.12%  github.com/eapache/queue.(*Queue).Peek
     1.69s  1.12% 98.69%      2.85s  1.89%  runtime.typedmemclr
     1.16s  0.77% 99.46%      1.16s  0.77%  runtime.memclrNoHeapPointers
     0.24s  0.16% 99.62%     23.41s 15.50%  runtime.lockWithRank (inline)
     0.10s 0.066% 99.68%     23.51s 15.56%  runtime.lock
     0.01s 0.0066% 99.69%     20.26s 13.41%  runtime.unlock (inline)
         0     0% 99.69%    151.05s   100%  github.com/Shopify/sarama.withRecover
         0     0% 99.69%     20.25s 13.41%  runtime.unlockWithRank (inline)
(pprof) 
(pprof) list newBrokerProducer.func2
Total: 151.05s
ROUTINE ======================== github.com/Shopify/sarama.(*asyncProducer).newBrokerProducer.func2 in /data/root/go/pkg/mod/github.com/!shopify/[email protected]/async_producer.go
     4.37s    151.05s (flat, cum)   100% of Total
         .          .    738:   // In order to avoid a deadlock when closing the broker on network or malformed response error
         .          .    739:   // we use an intermediate channel to buffer and send pending responses in order
         .          .    740:   // This is because the AsyncProduce callback inside the bridge is invoked from the broker
         .          .    741:   // responseReceiver goroutine and closing the broker requires such goroutine to be finished
         .          .    742:   go withRecover(func() {
         .       60ms    743:           buf := queue.New()
     500ms      500ms    744:           for {
      80ms       90ms    745:                   if buf.Length() == 0 {
         .          .    746:                           res, ok := <-pending
         .          .    747:                           if !ok {
         .          .    748:                                   // We are done forwarding the last pending response
         .          .    749:                                   close(responses)
         .          .    750:                                   return
         .          .    751:                           }
         .          .    752:                           buf.Add(res)
         .          .    753:                   }
         .          .    754:                   // Send the head pending response or buffer another one
         .          .    755:                   // so that we never block the callback
     550ms      2.24s    756:                   headRes := buf.Peek().(*brokerProducerResponse)
     2.03s    146.95s    757:                   select {
     740ms      740ms    758:                   case res, ok := <-pending:
      70ms       70ms    759:                           if !ok {
         .          .    760:                                   continue
         .          .    761:                           }
         .          .    762:                           buf.Add(res)
         .          .    763:                           continue
     400ms      400ms    764:                   case responses <- headRes:
         .          .    765:                           buf.Remove()
         .          .    766:                           continue
         .          .    767:                   }
         .          .    768:           }
         .          .    769:   })

Reference

  • code: https://github.com/Shopify/sarama/blob/23c42869222b1467841a9b6318a7ceb159c6a6fd/async_producer.go#L757
  • PR: #2133

MoilkNepho avatar Jun 10 '22 05:06 MoilkNepho

@slaunay hi~ can you please check this issue?

MoilkNepho avatar Jun 10 '22 06:06 MoilkNepho

@MoilkNepho did you resolve this issue with newer versions or is it something you still see occurring?

dnwe avatar Aug 17 '23 20:08 dnwe

I was doing some testing of Sarama v1.38.1 today, trying to debug a deadlock in the AsyncProducer code that happens during broker rolling restart (ever since the v1.31.x release, unsure if it's in our code or Sarama for now), and I noticed the exact same behaviour from time to time: 100% CPU usage in that function.

So I would say it's still occurring yes.

dethi avatar Aug 18 '23 00:08 dethi

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur. Please check if the main branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

github-actions[bot] avatar Nov 16 '23 08:11 github-actions[bot]