java-dogstatsd-client
java-dogstatsd-client copied to clipboard
Errors in NonBlockingStatsDClient.QueueConsumer are not recoverable
QueueConsumer
does not recover from java.lang.Error
instances and there's no API to re-schedule another QueueConsumer
. That results in the message queue getting filled up and no metrics getting emitted.
I had an application instance that had an OutOfMemoryError
thrown in QueueConsumer
. Here's the stack trace of the thread that was supposed to run QueueConsumer
:
StatsD-pool-1-thread-1 tid=23 [WAITING] [DAEMON]
sun.misc.Unsafe.park(boolean, long) Unsafe.java
java.util.concurrent.locks.LockSupport.park(Object) LockSupport.java:175
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await() AbstractQueuedSynchronizer.java:2039
java.util.concurrent.LinkedBlockingQueue.take() LinkedBlockingQueue.java:442
java.util.concurrent.ThreadPoolExecutor.getTask() ThreadPoolExecutor.java:1067
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) ThreadPoolExecutor.java:1127
java.util.concurrent.ThreadPoolExecutor$Worker.run() ThreadPoolExecutor.java:617
java.lang.Thread.run() Thread.java:745
Here are some things I think we could do to mitigate that:
- Minimize the number of allocations in
QueueConsumer#run
. In particular, packet encoding could be performed in the client threads - Add API for re-scheduling the failed
QueueConsumer
- Handle
OutOfMemoryError
(are there other recoverable errors?) inQueueConsumer
I ran into a very similar problem with com.timgroup.statsd.NonBlockingStatsDClient.StatsDSender. Just catching OutOfMemoryError in the run method would have handled my case.