adhocracy icon indicating copy to clipboard operation
adhocracy copied to clipboard

redis causes significant CPU load while idling

Open phihag opened this issue 11 years ago • 2 comments

On my development machine, I was curious to discover that in the long term, the CPU usage breakdown of the various services looked as follows:

  • adhocracy_worker: 0.3%
  • lamson: 6.6%
  • memcached: 0.3%
  • solr: 7.9%
  • redis: 84.9%

This doesn't seem right, does it?

strace-ing redis showed:

epoll_wait(3, {}, 2016, 10)             = 0
gettimeofday({1365085250, 837352}, NULL) = 0
gettimeofday({1365085250, 837451}, NULL) = 0
gettimeofday({1365085250, 837500}, NULL) = 0
gettimeofday({1365085250, 837556}, NULL) = 0
gettimeofday({1365085250, 837830}, NULL) = 0
gettimeofday({1365085250, 837880}, NULL) = 0
gettimeofday({1365085250, 837928}, NULL) = 0
gettimeofday({1365085250, 837976}, NULL) = 0
gettimeofday({1365085250, 838023}, NULL) = 0
gettimeofday({1365085250, 838170}, NULL) = 0
gettimeofday({1365085250, 838220}, NULL) = 0
gettimeofday({1365085250, 838268}, NULL) = 0
gettimeofday({1365085250, 838315}, NULL) = 0
gettimeofday({1365085250, 838363}, NULL) = 0
gettimeofday({1365085250, 838410}, NULL) = 0
gettimeofday({1365085250, 838458}, NULL) = 0
gettimeofday({1365085250, 838506}, NULL) = 0
gettimeofday({1365085250, 838553}, NULL) = 0
gettimeofday({1365085250, 838601}, NULL) = 0
gettimeofday({1365085250, 838648}, NULL) = 0
gettimeofday({1365085250, 838696}, NULL) = 0
gettimeofday({1365085250, 838782}, NULL) = 0
gettimeofday({1365085250, 838869}, NULL) = 0
gettimeofday({1365085250, 838919}, NULL) = 0
epoll_wait(3, {}, 2016, 10)             = 0

Quite obviously, this is insane - the redis server is idle, but sets up 2000 syscalls per second!

Attaching a debugger showed where this madness is commited:

#0  0x00007ff138cc1163 in epoll_wait () from /lib/libc.so.6
#1  0x000000000041760e in aeApiPoll (eventLoop=0x7ff138055150, flags=<value optimized out>) at ae_epoll.c:107
#2  aeProcessEvents (eventLoop=0x7ff138055150, flags=<value optimized out>) at ae.c:370
#3  0x00000000004179ab in aeMain (eventLoop=0x7ff138055150) at ae.c:425
#4  0x000000000041de9f in main (argc=2, argv=0x7fff8a794978) at redis.c:2625

With a current build of redis, the useless calls to gettimeout vanished, but the incessant epolls are still there.

We should check under what conditions the low timeout happens, how we can avoid it, and either reconfigure redis, patch redis, or move to an alternative solution.

phihag avatar Apr 04 '13 14:04 phihag

This doesn't seem right, does it?

Nope, definitely no good. However this doesn't occur on any of our servers (Debian squeeze, x86_64) and neither on my development machine (Fedora 18, x86_64).

With a current build of redis, the useless calls to gettimeout vanished, but the incessant epolls are still there.

The release notes for 2.6.11 state:

[IMPROVED] Use a lot less CPU when idle, even with many configured DBs.

I have now upgraded to the latest version in 89dbd32a3914031412ca5732721cce30f632caed.

We should check under what conditions the low timeout happens, how we can avoid it, and either reconfigure redis, patch redis, or move to an alternative solution.

Sure, feel free to investigate... As said before, I can't reproduce this right now. I vaguely remember to have seen this before, but assumed misuse from my side.

nidico avatar Apr 04 '13 22:04 nidico

Looks like a dup of https://github.com/antirez/redis/issues/2552

coppit avatar Apr 26 '17 14:04 coppit