k8s-worker-pod-autoscaler icon indicating copy to clipboard operation
k8s-worker-pod-autoscaler copied to clipboard

WPA Beanstalk is crashing sometimes with nil pointer error.

Open alok87 opened this issue 4 years ago • 3 comments

WPA is crashing and restarting sometimes with the nil pointer error

I0702 11:51:51.353487       1 controller.go:484] dev-call_otp_updater-crstg1 minWorkers=0, maxDisruptableWorkers=0
I0702 11:51:51.353505       1 controller.go:356] dev-call_otp_updater-crstg1: messages: 0, idle: 0, desired: 0
I0702 11:51:51.353513       1 controller.go:593] crstg1/25-29871-otpcallupdater-praccomm-api: WPA status is already up to date
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1202436]

goroutine 45213 [running]:
github.com/beanstalkd/go-beanstalk.(*Conn).cmd(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x16c2531, 0xa, 0xc00d553a38, 0x1, ...)
        /src/vendor/github.com/beanstalkd/go-beanstalk/conn.go:76 +0x66
github.com/beanstalkd/go-beanstalk.(*Tube).Stats(0xc00d553a90, 0x34, 0xc00085c1de, 0x16)
        /src/vendor/github.com/beanstalkd/go-beanstalk/tube.go:93 +0xcf
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*beanstalkClient).executeGetStats(0xc00734c440, 0xc00d553b98, 0x139935a, 0x1541140, 0x15dc740)
        /src/pkg/queue/beanstalk.go:123 +0x90
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*beanstalkClient).getStats(0xc00734c440, 0xc000a2ec00, 0x34, 0x1a0bec0, 0xc00734c440)
        /src/pkg/queue/beanstalk.go:144 +0x2f
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Beanstalk).getMessages(0xc00017e090, 0xc000a2ec00, 0x34, 0x446235, 0xa1bc22, 0xc00e35a450)
        /src/pkg/queue/beanstalk.go:262 +0x8a
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Beanstalk).poll(0xc00017e090, 0xc0014c3b00, 0x26, 0xc000a2ec1e, 0x16, 0xc00b599a70, 0x6, 0xc000a2ec00, 0x34, 0xc000a2ec0c, ...)
        /src/pkg/queue/beanstalk.go:351 +0x75
github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).runPollThread(0xc00017e0f0, 0xc0014c3b00, 0x26)
        /src/pkg/queue/poller.go:46 +0x88
created by github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).Run
        /src/pkg/queue/poller.go:93 +0x209

alok87 avatar Jul 02 '20 12:07 alok87

Same thing is happening with SQS, except its causing the pod to go in a CrashLoopBackOff for us:

 panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                 
 [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x137304a]                                                                                                                                 
 goroutine 79 [running]:                                                                                                                                                                                 
 github.com/aws/aws-sdk-go/service/cloudwatch.(*CloudWatch).newRequest(0x0, 0xc001550080, 0x15f22c0, 0xc001550040, 0x1582580, 0xc0015500c0, 0x0)                                                         
     /src/vendor/github.com/aws/aws-sdk-go/service/cloudwatch/service.go:90 +0x3a                                                                                                                        
 github.com/aws/aws-sdk-go/service/cloudwatch.(*CloudWatch).GetMetricDataRequest(0x0, 0xc001550040, 0x1491d20, 0xc0003f5b01)                                                                             
     /src/vendor/github.com/aws/aws-sdk-go/service/cloudwatch/api.go:1527 +0x1d4                                                                                                                         
 github.com/aws/aws-sdk-go/service/cloudwatch.(*CloudWatch).GetMetricData(0x0, 0xc001550040, 0xc00003e37c, 0x9, 0x24dcf60)                                                                               
     /src/vendor/github.com/aws/aws-sdk-go/service/cloudwatch/api.go:1589 +0x35                                                                                                                          
 github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*SQS).getAverageNumberOfMessagesSent(0xc000168360, 0xc00003e370, 0x50, 0x0, 0x0, 0xc000000000)                                                   
     /src/pkg/queue/sqs.go:340 +0x4d6                                                                                                                                                                    
 github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*SQS).cachedNumberOfSentMessages(0xc000168360, 0xc00003e370, 0x50, 0x0, 0x1, 0xc000215d08)                                                       
     /src/pkg/queue/sqs.go:260 +0x13f                                                                                                                                                                    
 github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*SQS).poll(0xc000168360, 0xc000a6c180, 0x40, 0xc00003e3a1, 0x1f, 0xc00020a040, 0x20, 0xc00003e370, 0x50, 0xc00003e378, ...)                      
     /src/pkg/queue/sqs.go:434 +0xfd1                                                                                                                                                                    
 github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).runPollThread(0xc000606600, 0xc000a6c180, 0x40)                                                                                         
     /src/pkg/queue/poller.go:46 +0x88                                                                                                                                                                   
 created by github.com/practo/k8s-worker-pod-autoscaler/pkg/queue.(*Poller).Run                                                                                                                          
     /src/pkg/queue/poller.go:93 +0x209                                                                                                                                                                  
 stream closed

matkam avatar Aug 28 '20 18:08 matkam

I added some debug lines, and it turns out the CW client returned by getCWClient(...) is nil when this happens! The problem is I was using a queueUri that incorrectly referred to a region not included in the --aws-regions flag.

matkam avatar Aug 28 '20 18:08 matkam

Hey thanks I will check this soon and close.

alok87 avatar Aug 29 '20 14:08 alok87