paho.mqtt.python
paho.mqtt.python copied to clipboard
due to select() paho-mqtt is unable to connect if more than 1024 file handle are used
Problem description:
Python version: 3.9 Paho-MQTT version: 1.6.1 When using 1000 threads, each thread as a client to connect to the MQTT service, due to the _socketpair_compat function in loop_start, only a few hundred clients can be connected, and all clients cannot be connected successfully. After adjusting the system file handle number to 65535, it still fails to connect. However, if the _socketpair_compat function is commented out, all clients can connect successfully.
Question:
Is there any way to solve this problem?
If you really need 1000 threads I would strongly suggest a library with native async support, e.g.:
https://github.com/toreamun/asyncio-paho
That's a nice issue... pretty obscure to find the cause if you never see such issue. tl; dr: we should no longer use select()
Here is how to reproduce the same issue you had with an every more strange code:
import paho.mqtt.client as mqtt
import time
# Here the magic happen :)
files = [open("/etc/hosts") for _ in range(1019)]
mqttc = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
mqttc.connect("mqtt.eclipseprojects.io")
mqttc.loop_start()
time.sleep(5) # Give network the time to do the handshake
print(mqttc.is_connected())
This will fail, the client will not be connected. To fix this code, just change the number 1019 in 1018 :)
More seriously, the issue is:
>>> mqttc._sockpairR
<socket.socket fd=1024, family=2, type=1, proto=0, laddr=('127.0.0.1', 52282), raddr=('127.0.0.1', 45195)>
>>> select.select([mqttc._sockpairR], [], [], 1) # This is approximately what loop does
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: filedescriptor out of range in select()
This issue is that select (only on Linux ?) can't work with FD >= 1024
WARNING: select() can monitor only file descriptors numbers that are less than FD_SETSIZE (1024) -- https://manpages.debian.org/unstable/manpages-dev/select.2.en.html
In your program, you should have about 340 connections working. Socket pair (as it name said) create 2 FDs. 340 * 3 (the MQTT socket & the two sockets of the socket pair) = 1020. Then add stdout, stdin and stderr -> 1023.
The immediate fix is to don't use select() which means don't use loop(), loop_start() or loop_forever(). This mostly means use the external loop a.k.a an ayncio (either with a third-party that wrap it, or directly - there is some example). It should also be possible to use multiple processes to spread the connections to avoid reaching the FD number 1024, but I think this is too complex for the neeed.
The right fix is to change paho so that it stop using select() and use modern solution (probably Python selectors).
If you really need 1000 threads I would strongly suggest a library with native async support, e.g.:
https://github.com/toreamun/asyncio-paho
thanks,I will try!
This mostly means use the external loop a.k.a an ayncio (either with a third-party that wrap it, or directly - there is some example
Can you provide me with some examples or other packages that can solve this problem?
This mostly means use the external loop a.k.a an ayncio (either with a third-party that wrap it, or directly - there is some example
Can you provide me with some examples or other packages that can solve this problem?
I'm not using paho-mqtt with asyncio, so I don't really know one. I've seen the name https://github.com/sbtinstruments/aiomqtt passed in another issue. You can also look at:
- https://github.com/eclipse/paho.mqtt.python/blob/master/examples/loop_asyncio.py
- https://github.com/eclipse/paho.mqtt.python/blob/master/examples/loop_trio.py
- https://github.com/eclipse/paho.mqtt.python/blob/master/examples/loop_trio.py
thanks!