carbon-c-relay
carbon-c-relay copied to clipboard
Performance Issue: high sys CPU usage on the nanosleep syscall
Hi @grobian , We have notice that our carbon-c-relay (v2.6 ´-2017-01-26-) on my RHEL6.6 is consuming lot of user and system cpu , read (~60%) and nanosleep (~30%). Read is spending user-cpu and nanosleep as sys-cpu .
It seems like nanosleep has a strange behaviour (https://stackoverflow.com/questions/1125297/nanosleep-high-cpu-usage, https://stackoverflow.com/questions/16726872/cpu-high-usage-of-the-usleep-on-cent-os-6-3)
Perhaps could be good if these "sleep" times could be removed allowing threads block until data available, in the meanwhile cpu time would be used by other processes/threads.
#.root:~ > top
top - 15:36:51 up 46 days, 11:00, 2 users, load average: 9.95, 8.72, 8.14
Tasks: 286 total, 1 running, 285 sleeping, 0 stopped, 0 zombie
Cpu(s): 42.7%us, 30.1%sy, 0.0%ni, 25.8%id, 0.0%wa, 0.1%hi, 1.3%si, 0.0%st
Mem: 12190152k total, 7435500k used, 4754652k free, 696264k buffers
Swap: 2031612k total, 0k used, 2031612k free, 4446144k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12258 carbon-c 20 0 3046m 197m 896 S 317.7 1.7 31631:29 carbon-c-relay
48583 apache 20 0 1141m 125m 5500 S 59.4 1.1 0:29.97 httpd.worker
50521 apache 20 0 629m 64m 4924 S 19.2 0.5 0:00.58 httpd.worker
2237 carbon 20 0 308m 21m 1728 S 10.9 0.2 4355:28 carbon-cache.py
49460 apache 20 0 822m 87m 6424 S 5.3 0.7 0:09.95 httpd.worker
#.root:~ > strace -f -c -p 12258
Process 12258 attached with 43 threads - interrupt to quit
^CProcess 12258 detached
Process 12260 detached
Process 12261 detached
Process 12262 detached
Process 12263 detached
Process 12264 detached
Process 12265 detached
Process 12266 detached
Process 12267 detached
Process 12268 detached
Process 12269 detached
Process 12270 detached
Process 12271 detached
Process 12272 detached
Process 12273 detached
Process 12274 detached
Process 12275 detached
Process 12276 detached
Process 12277 detached
Process 12278 detached
Process 12279 detached
Process 12280 detached
Process 12281 detached
Process 12282 detached
Process 12283 detached
Process 12284 detached
Process 12285 detached
Process 12286 detached
Process 12287 detached
Process 12288 detached
Process 12289 detached
Process 12290 detached
Process 12291 detached
Process 12292 detached
Process 12293 detached
Process 12294 detached
Process 12295 detached
Process 12296 detached
Process 12297 detached
Process 12298 detached
Process 12299 detached
Process 12300 detached
Process 12301 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
59.02 1861.055146 301 6177866 5997479 read
27.41 864.251631 174069 4965 nanosleep
9.39 295.981045 318 929635 write
1.48 46.592231 314 148396 mprotect
0.64 20.276546 441 46002 poll
0.48 15.035243 327 45942 close
0.44 13.768712 299 46001 accept
0.42 13.320914 290 46006 fcntl
0.41 12.951691 282 46000 getpeername
0.24 7.700703 321 24014 6211 futex
0.05 1.515535 94721 16 restart_syscall
0.01 0.398242 306 1300 rt_sigprocmask
0.01 0.196499 302 650 rt_sigaction
0.00 0.001629 326 5 socket
0.00 0.001397 349 4 setsockopt
0.00 0.001021 511 2 getsockname
0.00 0.000936 234 4 recvmsg
0.00 0.000767 384 2 open
0.00 0.000696 232 3 3 connect
0.00 0.000689 345 2 fstat
0.00 0.000688 344 2 munmap
0.00 0.000000 0 2 mmap
0.00 0.000000 0 2 sendto
0.00 0.000000 0 2 bind
------ ----------- ----------- --------- --------- ----------------
100.00 3153.051961 7516823 6003693 total
It seems to suggest the sleeps should be longer. The sleeps are a vital part in avoiding contention/thundering hurd behaviour, they are not used to synchronise or something.
Because poll is now used to see if there's something to read, the read() cost should go down. A semaphore is now used with a timeout, this is where the sleeptime should go now.