adhocracy
adhocracy copied to clipboard
redis causes significant CPU load while idling
On my development machine, I was curious to discover that in the long term, the CPU usage breakdown of the various services looked as follows:
- adhocracy_worker: 0.3%
- lamson: 6.6%
- memcached: 0.3%
- solr: 7.9%
- redis: 84.9%
This doesn't seem right, does it?
strace-ing redis showed:
epoll_wait(3, {}, 2016, 10) = 0
gettimeofday({1365085250, 837352}, NULL) = 0
gettimeofday({1365085250, 837451}, NULL) = 0
gettimeofday({1365085250, 837500}, NULL) = 0
gettimeofday({1365085250, 837556}, NULL) = 0
gettimeofday({1365085250, 837830}, NULL) = 0
gettimeofday({1365085250, 837880}, NULL) = 0
gettimeofday({1365085250, 837928}, NULL) = 0
gettimeofday({1365085250, 837976}, NULL) = 0
gettimeofday({1365085250, 838023}, NULL) = 0
gettimeofday({1365085250, 838170}, NULL) = 0
gettimeofday({1365085250, 838220}, NULL) = 0
gettimeofday({1365085250, 838268}, NULL) = 0
gettimeofday({1365085250, 838315}, NULL) = 0
gettimeofday({1365085250, 838363}, NULL) = 0
gettimeofday({1365085250, 838410}, NULL) = 0
gettimeofday({1365085250, 838458}, NULL) = 0
gettimeofday({1365085250, 838506}, NULL) = 0
gettimeofday({1365085250, 838553}, NULL) = 0
gettimeofday({1365085250, 838601}, NULL) = 0
gettimeofday({1365085250, 838648}, NULL) = 0
gettimeofday({1365085250, 838696}, NULL) = 0
gettimeofday({1365085250, 838782}, NULL) = 0
gettimeofday({1365085250, 838869}, NULL) = 0
gettimeofday({1365085250, 838919}, NULL) = 0
epoll_wait(3, {}, 2016, 10) = 0
Quite obviously, this is insane - the redis server is idle, but sets up 2000 syscalls per second!
Attaching a debugger showed where this madness is commited:
#0 0x00007ff138cc1163 in epoll_wait () from /lib/libc.so.6
#1 0x000000000041760e in aeApiPoll (eventLoop=0x7ff138055150, flags=<value optimized out>) at ae_epoll.c:107
#2 aeProcessEvents (eventLoop=0x7ff138055150, flags=<value optimized out>) at ae.c:370
#3 0x00000000004179ab in aeMain (eventLoop=0x7ff138055150) at ae.c:425
#4 0x000000000041de9f in main (argc=2, argv=0x7fff8a794978) at redis.c:2625
With a current build of redis, the useless calls to gettimeout
vanished, but the incessant epolls
are still there.
We should check under what conditions the low timeout happens, how we can avoid it, and either reconfigure redis, patch redis, or move to an alternative solution.
This doesn't seem right, does it?
Nope, definitely no good. However this doesn't occur on any of our servers (Debian squeeze, x86_64) and neither on my development machine (Fedora 18, x86_64).
With a current build of redis, the useless calls to gettimeout vanished, but the incessant epolls are still there.
The release notes for 2.6.11 state:
[IMPROVED] Use a lot less CPU when idle, even with many configured DBs.
I have now upgraded to the latest version in 89dbd32a3914031412ca5732721cce30f632caed.
We should check under what conditions the low timeout happens, how we can avoid it, and either reconfigure redis, patch redis, or move to an alternative solution.
Sure, feel free to investigate... As said before, I can't reproduce this right now. I vaguely remember to have seen this before, but assumed misuse from my side.
Looks like a dup of https://github.com/antirez/redis/issues/2552