SocketReactor while(1) CPU wasted
I think with 1.12.2 we revert to the old problem of socket reactor goes in "while(1)" if no socket is sending.
So for example I can reproduce the problem simply using telnet to connect to the server and while(1) starts.
That because it seems that
PollSet::SocketModeMap sm = _pollSet.poll(_timeout);
is not waiting for _timeout but it returns immediately with sm.size()==0 so onTimeout() is called. So I'm going to think if do a trySleep inside onTimeout() but I want to check why poll is not waiting for timeout, I'll keep you updated
ps.using windows 11 compiled with VisualStudio 2022 and using SocketReactor basically to handle FTP command (21 port) sockets
The problem is the epoll_wait because event if there's no socket with readable data, it returns 1 immediately with rc=1 and than the if on EPOLLIN tells you no bytes available, any suggestion?
disabling POCO_HAVE_FD_EPOLL all works fine
I am using 1.12 reactor heavily and it works fine, regardless of whether there is traffic or not.
Timeout is a tricky thing for performance - when you have data, you want to spin as fast as possible for max throughput, but when there's no data, you have to slow down or else you hog the cpu.
Please read the documentation carefully, especially the timeout part, it has changed and there are two timeouts. When poll timeout is zero (I'm getting best results with this setting on Linux), only then the reactor timeout gradually ramps up in a no-traffic scenario.
If you still experience problems, please write code we can run and reproduce.
I'm going to dig into the problem and I'll tell you what is going on. Yesterday I found that the problem seems to be the epoll_wait that returns without waiting, I have to figure out why. Anyway I'm using windows OS (windows 11). This problem remind me what happen when we switch to WSAPoll. I'll let you know
OK indeed I found the problem. So in metacode is working like that:
- accepting new socket
- create the handler giving reactor and socket (I do not use acceptor because I have to do something particular before creating handler) The problem is that in my code after creating reactor I called reactor's wakeUp() that is this;
void wakeUp()
{
#ifdef WEPOLL_H_
StreamSocket ss(SocketAddress("127.0.0.1", _port));
#else
uint64_t val = 1;
// This is guaranteed to write into a valid fd,
// or 0 (meaning PollSet is being destroyed).
// Errors are ignored.
write(_eventfd, &val, sizeof(val));
#endif
}
in windows that
StreamSocket ss(SocketAddress("127.0.0.1", _port));
seems to cause me that strange while(1) behavior, so what happen is that the function GetQueuedCompletionStatusEx of wepoll.c do not wait for timeout but exit immediately probably because that StreamSocket ss couse some kind of "problem"
what is the intent of that wakeUp() implementation?
I think probably in the wakeUp implementation a write is missing in windows implementation?
Even using something like this:
void wakeUp()
{
uint64_t val = 1;
#ifdef WEPOLL_H_
try {
StreamSocket ss(SocketAddress("127.0.0.1", _port));
ss.sendBytes(&val, sizeof(val));
}
catch (...) {}
#else
// This is guaranteed to write into a valid fd,
// or 0 (meaning PollSet is being destroyed).
// Errors are ignored.
write(_eventfd, &val, sizeof(val));
#endif
}
wakeUp is not working (cause the same problem to the wepoll), I'm going to disable the wakeUp for now
Moreover I can bump into the same behavior if I connect with telnet to the "socket acceptor" server port, and than connecting with telnet to the "secret" port created with
_eventfd(eventfd(_port, 0)),
that way I can reproduce the while(1) behavior, the timeout is no longer respected and onTimeout callback is triggered millions times
I found a possible memory leak
~PollSetImpl()
{
#ifdef WEPOLL_H_
if (_eventfd >= 0) eventfd(_port, _eventfd); //IF _eventfd=0 we have a possible memory leak
#else
if (_eventfd > 0) close(_eventfd.exchange(0));
#endif
if (_epollfd >= 0) close(_epollfd);
}
What you're seeing is likely because you don't have the fix for #3708
https://github.com/pocoproject/poco/commit/578efadef1e152ba6a1a2d59b149817875f3801d#diff-d9df88d5edc946e44aafe09a4df4524d3fde3761c3f9caa9e893c81b12ad708bR211
I think already have in my code
_pSocket->setBlocking(false); probably this is the default
and I have
if (_pSocket && _pSocket->available())
_pSocket->impl()->receiveBytes(&val, sizeof(val));
the difference is the available, but I don't think makes the difference here so I think we still have a problem in the 1.12.2 code, even the memory leak (that is quite impossible anyway because you should end up with a _eventfd =0 that is not possible)
Anyway I think it's not a good idea to have an "exposed" TCP port open in listen in your software that you can't control
Wepoll works with sockets only, so there's no other wake-up mechanism available on windows. And the whole point of listening sockets is to open ports and receive data - how much more danger than other opened ports does a localhost port, from which a single integer is read and thrown away, present? True, someone may try to flood it with data, but isn't that true for all other opened ports, too? Besides, an intruder has to be on your machine to do that, at which point you have a bigger problem than the opened port.
Thinking about it, the listening socket can be closed after we make the connection. I'll look at it.
ok, even because you are think to an application that has 1 PollSet, and it's not a big deal to have a port open (even if I don't like the idea) always if the random port is not exactly the port you need to use...but the problem is that if for some reason you have to use 10 PollSet you have 10 TCP open port...not a good idea. I prefer to wait for timeout on closing PollSet without waking up nothing
@micheleselea can you check if #3852 fixes the original issue you reported?
As for the ports, I recommend to open a separate issue for that, it's hard to follow when unrelated things are mixed into an issue.
@aleks-f I have to revert my changes because in lib version I'm using in production environment I'm avoiding creating socket at all. But I don't think the problem was the Mutext type, I think the problem in windows implementation of wakeUp that cause inner GetQueuedCompletionStatusEx of wepoll.c will not wait for timeout anymore. This will happen with a wakeUp call or with a telnet connection on _port
@aleks-f The problem is still there. To reproduce on windows system
- Create reactor on port NNN
- Write down port opened in
_eventfd(eventfd(_port, 0)),(call it MMM) - telnet 127.0.0.1 NNN
- telnet 127.0.0.1 MMM
you should see the "while(1)" behavior
The problem is the
GetQueuedCompletionStatusEx
Point 4 it's quite the same if you call wakeUp() on thread
I confirm, the problem occurs in version 1.12.4, when using Windows epool. If the client connection is broken, the server loop in SocketReactor::run(), will never complete, because the PollSet::POLL_READ state is not executed.
I confirm, the problem occurs in version 1.12.4, when using Windows epool. If the client connection is broken, the server loop in SocketReactor::run(), will never complete, because the PollSet::POLL_READ state is not executed.
that is a different problem, fixed here
@aleks-f The problem is still there. To reproduce on windows system
- Create reactor on port NNN
- Write down port opened in
_eventfd(eventfd(_port, 0)),(call it MMM)- telnet 127.0.0.1 NNN
- telnet 127.0.0.1 MMM you should see the "while(1)" behavior The problem is the
GetQueuedCompletionStatusExPoint 4 it's quite the same if you call wakeUp() on thread
@micheleselea why would I want to telnet to that socket, especially since the wakeUp is doing the same thing? Reactor in devel has been doing it for quite some time and I've seen no problems:
https://github.com/pocoproject/poco/blob/47ddacd0048ed404494ba6fb1616291aa18f27dc/Net/src/SocketReactor.cpp#L171
And where exactly is while(1)? There is no such thing anywhere.
If there is indeed a problem, please produce code demonstrating it and we will look into it. I also recommend to check your code - undefined behavior can be a nasty thing.
Anyway, in 1.13 there will be the option to use unix sockets on windows. I do not see the problem with PollSet::wakeUp() in 1.2 branch or devel.
@aleks-f told you, telnet was just a way to fast test the issue. The way you do, at list on windows, the wakeup is the same way you can simulate with a telnet. While(1) is not explicitly there while(1) is a way to tell you that wepoll when you are in that situation (wepoll or telnet) act as do not respect the timeout so like a while(1) Anyway never mind, do what you think it's ok, I all ready fix it in my code