Michael Martinez

Results 13 comments of Michael Martinez

I should add that I'm running carbon-c-relay thus: `-q 400000 -B 4096 -U 16777216`

I'm not seeing any "write" errors in the log. My expire is set to 90 seconds. Maybe I'll try increasing it.

I increased it to 100 and this had the immediate effect of dropping a lot more metrics. I reverted it back to 90. Maybe I'll try a lower value.

Our relay cluster (radar112) performs a first-stage aggregation. Its config file: ``` cluster radar122 fnv1a_ch radar122-a.mgmt.xxxcom:1905=a radar122-b.mgmt.xxx.com:1905=b ... ; match ^agg\. send to radar122 ; match ^agg\. send to blackhole...

I increased the expiry time on both clusters

CPU on our front end relay hosts (which only run carbon-c-relay) looks like this: ``` Linux 4.9.27-14.33.amzn1.x86_64 (ip-172-17-17-74) 07/20/2017 _x86_64_ (4 CPU) 01:34:44 AM CPU %usr %nice %sys %iowait %irq...

Now that I've resized our cluster, I'm coming back to take a look at this again.

I am seeing some (about 20/minute) dropped metrics of the following form: ``` @4000000059e0f5e010d4bd1c [2017-10-13 13:20:22] aggregator: dropping metric too far in the future (1507916110 > 1507915260): agg.brisk.system.cassandra-staging102.cassandra.compaction.system.size_estimates.sum_all.hosts from brisk.system.cassandra-staging102.cassandra.compaction.system.size_estimates.sum.ip-xxx_ec2_internal...

Thanks for keeping this thread alive. I still need to keep looking into it more on my end, but I haven't had time recently. We do a large number of...

@grobian I made sure the linux system time on our graphite clients is in sync with the aggregator hosts.