libzmq icon indicating copy to clipboard operation
libzmq copied to clipboard

Performance of WSAPoll vs. select vs. WSAWaitForMultipleEvents on Windows

Open sigiesec opened this issue 8 years ago • 25 comments

I tested the performance of the "old" select-based zmq_poller_poll implementation vs. the "new" WSAPoll-based zmq_poller_poll implementation on a Windows 7 x64 machine, and found the "old" select-based variant to perform about 10% better in an overall test scenario which uses libzmq quite deep inside.

Is this as expected? Is WSAPoll known to perform better in some scenarios?

These results I found speaks against that: https://groups.google.com/forum/#!topic/openpgm-dev/9qA1u-aTIKs

In fact, they seem to indicate that using WSAWaitForMultipleEvents would even perform better than select, so maybe it would make sense to add another implementation using that? Any thoughts?

sigiesec avatar Oct 25 '17 15:10 sigiesec

We have multiple implementations for Linux too, so IMHO it's fine if you want to add a new one

bluca avatar Oct 25 '17 15:10 bluca

I just noticed that there already is an implementation using WSAWaitForMultipleEvents, but it is only used when there are sockets from multiple address families in a poller.

It also seems to be broken, as after calling WSAWaitForMultipleEvents, it just falls through to the select code... Instead, probably WSAEnumNetworkEvents would need to be called on the event that was signalled.

sigiesec avatar Oct 25 '17 16:10 sigiesec

@Kentzo since you added this code with 538e5d47421f45d88a8b3b005ed9ee4026623a7b, can you say something about whether you tested this? It is not tested on appveyor at the moment, unfortunately.

sigiesec avatar Oct 25 '17 17:10 sigiesec

@Kentzo since you added this code with 538e5d47421f45d88a8b3b005ed9ee4026623a7b, can you say something about whether you tested this? It is not tested on appveyor at the moment, unfortunately.

sigiesec avatar Oct 25 '17 17:10 sigiesec

@sigiesec I did not benchmark it, because the goal was to make it to work at all. We run this code on thousands, mostly Windows, machines every day.

I thought about WSAEnumNetworkEvents, but decided to limit my intrusion as there was no WSAPoll implementation for Windows. Probably both changes can be unified with WSAEnumNetworkEvents. One thing I recall is the constant global limit on number of sockets WSAEnumNetworkEvents can take care of.

Kentzo avatar Oct 25 '17 19:10 Kentzo

Ok I now had another look and I think I understand how it works. I think a different implementation might do without select completely and outperform this.

What is suboptimal about the current implementation is that the events are created and configured on every call to loop, and even within each while iteration. It would be better to only create those once and reconfigure them when the poll set changes.

In addition, the events were even created when there was only one address family, in which case they were never used.

sigiesec avatar Oct 25 '17 20:10 sigiesec

I thought WSAEVENT is a simple C struct and time it takes to create it is nothing. I'd suggest to benchmark it before adding any caches.

Kentzo avatar Oct 25 '17 21:10 Kentzo

WSAEVENTS is but the wsa_eventst ctor calls WSACreateEvent which is expensive: https://github.com/zeromq/libzmq/commit/538e5d47421f45d88a8b3b005ed9ee4026623a7b#diff-872ce26b9fb528e0ec0abd474883ca8aR457

sigiesec avatar Oct 26 '17 06:10 sigiesec

I will benchmark/profile it, the expensive WSACreateEvent turned up by that. What is also expensive is get_fd_family, getsockname in particular.

loop, set_poll* and reset_poll* are among the most frequently called functions in libzmq, so their performance is critical.

sigiesec avatar Oct 26 '17 06:10 sigiesec

Something like LRU cache can be used for get_fd_family.

If you can implement it via WSAEnumNetworkEvents, select and associated methods can be avoided I think.

Kentzo avatar Oct 26 '17 20:10 Kentzo

I have added a cache in 37914d1be23b89f7bd747d02ee5a56a18a12d7c3, it is not LRU, but just randomly overwrites entries, this could still be improved.

I attempted an implementation wsa_event_select_t in https://github.com/sigiesec/libzmq/tree/add-wsa-eventselect-poller, but I got stuck somehow. I believe the problem is that FD_WRITE is only edge-triggered, i.e. it is only triggered again if a send failed with WSAEWOULDBLOCK, ~~but that is not quite compatible with the use of the poller in libzmq~~ (that is probably not true since epoll also behaves this way). If anyone has an idea if this is really the problem and/or how to solve it, that would be great.

sigiesec avatar Oct 27 '17 07:10 sigiesec

What might be even better in terms of performance/scalability were to use I/O completion ports, which is what NetMQ does: http://somdoron.com/2014/11/netmq-iocp/ However, I think this requires more extensive changes to libzmq.

@somdoron Are you planning to come to the ZeroMQ Pre-Fosdem Hackaton in February? Maybe we could work on porting your approach to the native libzmq then.

sigiesec avatar Oct 27 '17 10:10 sigiesec

@sigiesec Could you provide a simple benchmark that uses getsockopt on Windows alone? E.g. creation of 1000 000 sockets in a loop and getting family vs just creation?

Kentzo avatar Oct 27 '17 23:10 Kentzo

An even faster way appears to be the Winsock Registered I/O extensions, which are supported from Windows 8.1 (RIOCloseCompletionQueue, RIOCreateCompletionQueue, RIOCreateRequestQueue, RIODequeueCompletion, RIODeregisterBuffer, RIONotify, RIOReceive, RIOReceiveEx, RIORegisterBuffer, RIOResizeCompletionQueue, RIOResizeRequestQueue, RIOSend, RIOSendEx).

sigiesec avatar Feb 05 '18 10:02 sigiesec

@sigiesec not sure if it helpful but NetMQ is using IO Completion Ports. It required some work to make it work.

You can read more here: http://somdoron.com/2014/11/netmq-iocp/

somdoron avatar Feb 21 '18 12:02 somdoron

@somdoron Thanks, I already read your post ;) If I understand it correctly, all users of polling within libzmq must be changed to the reactive model, and then there can be an adapter that maps to the current poller_t API for the existing polling mechanisms. Very high level, but would you agree insofar?

sigiesec avatar Feb 21 '18 13:02 sigiesec

only the internal poller have to change, the external one (aka zmq_poll and zmq_poller) should stay as is.

The internal one and internal threading have to change, I can dig the commit of netmq if it will help you. It is kind of big change and not sure will give the same performance for non-windows OSes.

somdoron avatar Feb 21 '18 13:02 somdoron

Yes, a link to the relevant changes would be great.

Of course the internal and external pollers can be changed independently, but why do you suggest only to change the internal poller?

sigiesec avatar Feb 21 '18 14:02 sigiesec

the external poller is polling over very few FDs, usually 2 or 3. Even on linux zeromq is using poll and not epoll for the external poller. So there is no need to invest in IOCP for the external poller, you will not see any performance gain.

somdoron avatar Feb 21 '18 17:02 somdoron

Ok good to know that, thanks!

sigiesec avatar Feb 21 '18 17:02 sigiesec

this is the main commit:

https://github.com/zeromq/netmq/commit/99abdf8e84b3e341fffc1a2d8cd20741882eb1d0

Also I created a library for that called AsyncIO which wrap IOCP with nice API that fits .net.

I think it will also make your life easier to create such a library to wrap kqueue,epoll,IOCP,poll and select and provide simple API to zeromq.

somdoron avatar Feb 21 '18 17:02 somdoron

It might be much easier to integrate https://github.com/piscisaureus/wepoll

sigiesec avatar May 22 '18 11:05 sigiesec

That sounds quite promising!

bluca avatar May 22 '18 13:05 bluca

This might be even more interesting: https://github.com/truebiker/epoll_windows/commit/32442e432b3376cb98f5c0bda5a9a0a5e832b857 It is a fork based on a quite old version of wepoll, which adds support for eventfd-like abilities.

sigiesec avatar May 23 '18 12:05 sigiesec

For the record, it may be safe to consider direct use of WSAPoll whenever sys.getwindowsversion() >= (10, 0, 19041), as the infamous bug has reportedly been fixed, and the last Windows version without that fix went fully EOL in 2022.

James-E-A avatar Jul 25 '25 17:07 James-E-A