cyanite
cyanite copied to clipboard
Queue full error when trying to send data to Cyanite
Hello, We are trying Cyanite for the first time in our testing environment with two Cyanite+Graphite API instances (16G RAM, 8 Cores and SSD) and a three node Cassandra cluster (64G RAM, 8 Cores and SSD) but it seems it cannot handle the load. We see a lot of queue full errors in the Cyanite config file when starting our load test. We are using graphite stress tool to load test the environment:
java8 -jar build/libs/graphite-stresser-0.1.jar loadbalancer 2003 100 975 10 true
We have set the heap size of Cyanite to 12G
java -Xms12g -Xmx12g -jar cyanite-0.5.1-standalone-fix.jar --path cyanite.yaml
The cyanite.yaml file also contains the following ingest and write queues settings
queues:
defaults:
ingestq:
pool-size: 100
queue-capacity: 2000000
writeq:
pool-size: 100
queue-capacity: 2000000
It seems that it barely starts the load test and the following errors are sent continuously. NOTE that if even when we stop the load test the log file just keeps sending this type of errors non stop:
WARN [2017-12-08 09:46:13,337] nioEventLoopGroup-2-15 - io.netty.channel.DefaultChannelPipeline An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
java.lang.IllegalStateException: Queue full
at java.util.AbstractQueue.add(AbstractQueue.java:98)
at io.cyanite.engine.queue.EngineQueue.engine_event_BANG_(queue.clj:44)
at io.cyanite.engine.Engine.enqueue_BANG_(engine.clj:108)
at io.cyanite.input.carbon$pipeline$fn__16369.invoke(carbon.clj:40)
at io.cyanite.input.carbon.proxy$io.netty.channel.ChannelInboundHandlerAdapter$ff19274a.channelRead(Unknown Source)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:565)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:479)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
at java.lang.Thread.run(Thread.java:745)
Do you have any ides how to fix this? @pyr @ifesdjeen . I see that this issue was addressed here previously but, has it been fixed completely? Note that we have compiled Cyanite until this commit because the latest version does not return multiple metrics in a single query. I have submitted a issue for this bug as well.
We have tried load testing with the following data: hosts timers interval 10 * 6415 10 - 9600 per 10 seconds OK 10 * 97515 10 - 146250 per 10 seconds OK 10 * 1956*15 10 - 293400 per 10 seconds FAILED
This was done with two Cyanite nodes (16G RAM, 8 Cores and SSD). Cyanite daemon fails while we still have plenty of resources.
@dancb10 you can increase queue capacity.
What's your ingestion rate? How many events per second does Cyanite get approximately?
The number of events are written in the last commit. So there are 10 instances each sending 1956*10 number of metrics every 10 seconds. So there are 293400 number of metrics send every 10 seconds, that's 29340 each second. Note that we are using pretty powerful instances and we have also split the writes and reads into multiple instances. So we have two instances that write two that read each with its own elb. Instances are c3.2xlarge | 8 | 15 | 2 x 80 We are using 2 million queue size so @ifesdjeen you are saying to increase this number? I will try with 20 million.
We are using 2 million queue size so @ifesdjeen you are saying to increase this number?
Hm, no actually 2M should be usually ok...