Deadlock with `modbus.Slave._data_lock` and `hooks._LOCK`
We hit this as follows:
- in a
modbus_tcp.TcpServer.before_sendhook, we callmodbus.Slave.get_values; this runs in the TCP server thread (Thread-1) - meanwhile, in the main thread, we call
modbus.Slave.set_values(this isMainThread)
For these two cases, the order of lock acquisition is reversed: in Thread-1, hooks._LOCK is acquired before the before_send hook is called, and get_values acquires the Slave object _data_lock. Meanwhile, in MainThread, set_values first acquires the Slave _data_lock, and then calls hook modbus.ModbusBlock.setitem, so needing to acquire hooks._LOCK.
It can happen that Thread-1 acquires hooks._LOCK, then MainThread acquires _data_lock, and that is deadlock.
Thanks for raising this problem? You seems to have a potential fix for it, is this possible that you share your code? I will not have time to care of this in a short term
Since it removes capability, what we have is a workaround, not a fix, I have pushed a contrived example and the workaround to https://github.com/roryyorke/modbus-tk/tree/rory/remove-locks
The below I tested on an Ubuntu 20.04 system using Python 3.8:
>>> import sys; print(sys.version); print(sys.implementation)
3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0]
namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-38', hexversion=50857456, name='cpython', version=sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0))
On revision cea9ee7 (pre-fix), when I first run deadlock_server.py and then deadlock_client.py, the server shows (snipping lots of initial output):
main 2
('127.0.0.1', 57844) is connected with socket 4...
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
main 1
main 2
main 3
before_send hook 1
main 1
before_send hook 2
before_send hook 3
and outputs nothing further.
The client times out:
2021-01-24 10:01:58,435 INFO deadlock_client.main MainThread connected
Traceback (most recent call last):
File "deadlock_client.py", line 28, in <module>
main()
File "deadlock_client.py", line 24, in main
logger.info(master.execute(1, cst.READ_HOLDING_REGISTERS, 0, 3))
File "/home/rory/src/modbus-tk/modbus_tk/utils.py", line 39, in new
raise excpt
File "/home/rory/src/modbus-tk/modbus_tk/utils.py", line 37, in new
ret = fcn(*args, **kwargs)
File "/home/rory/src/modbus-tk/modbus_tk/modbus.py", line 298, in execute
response = self._recv(expected_length)
File "/home/rory/src/modbus-tk/modbus_tk/modbus_tcp.py", line 216, in _recv
rcv_byte = self._sock.recv(1)
socket.timeout: timed out
The server is stuck. From the output, you can see it got to the .set_values call in main, and the .get_values call in the hook.
If you remove the hook installation, or the .get_values call, the server and client continue. I didn't try removing the .set_values call, I presume that would work too.
In revision dbef15e, the server and client continue, since I've removed the modbus.ModbusBlock.setitem hook altogether.