carbon-c-relay High cpu_system with many open connections

Reincarnation of https://github.com/grobian/carbon-c-relay/issues/216

On heavy-load relays (15 000 persistent connections, 25 millions metric/minute) we have some problems - timeout when established new connections and very high CPU usage.

Feb 25 '20 09:02 msaf1980

Current dispatcher code have some problems:

read from all sockets (without poll)
full scan connections table when new connections established
long duration for lock when connections table resize (need realloc for memory buffers)

I do some refactoring of carbon-c-relay dispatcher code (switch dispatcher to libevent and refactor connections table). This reduce cpu and memory usage and we can process more connections on some hardware.

Some tests from my workstations (2000 connections, random delay 50-500 ms):

master TCP.CONNECT.OK 14807 (242/s) TCP.CONNECT.TIMEOUT 498811 (8177/s) TCP.SEND.OK 678226 (11118/s) TCP.SEND.RESET 7772 (127/s)
Version from https://github.com/msaf1980/carbon-c-relay/tree/libevent_pthread TCP.CONNECT.OK 21049 (350/s) TCP.CONNECT.TIMEOUT 4261 (71/s) TCP.SEND.OK 4066794 (67779/s) TCP.SEND.RESET 1049 (17/s)

Some profile

Feb 25 '20 09:02 msaf1980

this is with libevent right?

Mar 03 '20 07:03 grobian

Yes, with libevent with pthreads locks enabled. It's more simply than communicate via pipes or sockets.

Mar 04 '20 09:03 msaf1980

Hi, we plan to use carbon-c-relay, @grobian is there any plan to merge libevent patch soon?

@msaf1980 Can we just build a package from your work and use it?

May 11 '20 07:05 linux-wizard

I am a bit hesitant to take the patch(es) because I don't understand them. Mostly don't get why it performs better. That is, I see there is more than just libevent. Since I don't have the time to bring this all in, perhaps it's not a bad idea to let someone else take over and steer the direction for the relay?

May 11 '20 07:05 grobian

Current dispatcher code have some problems:
1. read from all sockets (without poll)

2. full scan connections table when new connections established

3. long duration for lock when connections table resize (need realloc for memory buffers)
I do some refactoring of carbon-c-relay dispatcher code (switch dispatcher to libevent and refactor connections table). This reduce cpu and memory usage and we can process more connections on some hardware.

Some tests from my workstations (2000 connections, random delay 50-500 ms):
1. master
   TCP.CONNECT.OK 14807 (242/s)
   TCP.CONNECT.TIMEOUT 498811 (8177/s)
   TCP.SEND.OK 678226 (11118/s)
   TCP.SEND.RESET 7772 (127/s)

2. Version from https://github.com/msaf1980/carbon-c-relay/tree/libevent_pthread
   TCP.CONNECT.OK 21049 (350/s)
   TCP.CONNECT.TIMEOUT 4261 (71/s)
   TCP.SEND.OK 4066794 (67779/s)
   TCP.SEND.RESET 1049 (17/s)
Some profile

what tool/utility produces measurements like this? Thanks

May 11 '20 09:05 grobian

I use simple stress test from https://github.com/msaf1980/carbontest It's not ideal, but work for me.

But main targets - not a perfomance, we need reduce CPU usage and stably process more than 10000 connections per server.

Aug 11 '20 12:08 msaf1980

I do some refactoring of carbon-c-relay dispatcher code (switch dispatcher to libevent and refactor connections table). This reduce cpu and memory u

I am a bit hesitant to take the patch(es) because I don't understand them. Mostly don't get why it performs better. That is, I see there is more than just libevent. Mostly don't get why it performs better.

Poll-like model (lbevent and etc) will be effective on many connections (5000 and greater) and when not 100% connections active in one time. It's real work mode for carbon-c-relay. Direct read from all connections produces high CPU usage and not stable on 10000 or more connections.

In our environment libevent version work well during 2 month without problems.

Aug 11 '20 12:08 msaf1980

Hi, we plan to use carbon-c-relay, @grobian is there any plan to merge libevent patch soon?

@msaf1980 Can we just build a package from your work and use it?

Yes, you can.But it's needed on really busy relays.

Aug 11 '20 12:08 msaf1980

No, I don't have plans to merge this. I switched from a micro-sleep spin-lock approach to a semaphore-based approach (basically uses notifications), this doesn't require an external dep (libevent). I need to find time to benchmark the throughput somehow, and see if there's obvious blocks. I cannot support the libevent code.

Aug 17 '20 09:08 grobian

How you plan to detect state of socket (ready for read, idle or connection hungup) with semaphore ? Traditional way to do this - user event-driven pollers like poll, epoll (Linux), kqueue (BSD). libevent is only library and used platform specific poller without need of support low-level platform specific code.

Aug 17 '20 16:08 msaf1980

poll() is already used to check which socket has work to do, the semaphore is used to wake up the worker threads.

Aug 17 '20 17:08 grobian

I might be wrong, but as far as I remember, problem of poll syscall under linux that it takes O(N) of amount of sockets your application have opened. While epoll() is O(1). And on thousands of connections epoll would be much faster (which kinda proven by the fork by @msaf1980, but on a cost of code complexity and portability if you decide to use it directly).

UPD: https://developpaper.com/in-depth-analysis-of-epoll/ - something like that, there were more detailed articles about that.

Or another article: https://idndx.com/2014/09/01/the-implementation-of-epoll-1/

Basically it's highly discouraged to use poll even on single socket when you have thousands of open sockets on Linux.

Aug 17 '20 22:08 Civil

right, could look at using epoll()

Aug 18 '20 06:08 grobian

I might be wrong, but as far as I remember, problem of poll syscall under linux that it takes O(N) of amount of sockets your application have opened. While epoll() is O(1). And on thousands of connections epoll would be much faster (which kinda proven by the fork by @msaf1980, but on a cost of code complexity and portability if you decide to use it directly).

UPD: https://developpaper.com/in-depth-analysis-of-epoll/ - something like that, there were more detailed articles about that.

Or another article: https://idndx.com/2014/09/01/the-implementation-of-epoll-1/

Basically it's highly discouraged to use poll even on single socket when you have thousands of open sockets on Linux.

Yes, that's right. Epoll is Linux specific. For BSD - kqueue, Solaris - /dev/poll. If we need be a portable, all of them must be supported. It's not trivially and more complex than use one library (which uses platform-specific way internally).

Aug 18 '20 07:08 msaf1980

this is a good point, I wasn't aware of that

Aug 26 '20 14:08 grobian