reraise exception in _process_received_data()
Hi there!
I had a problem like this:
2022-06-26 01:46:24.606: ERROR Exception raised while parsing message from server
Traceback (most recent call last):
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 82, in _process_received_data
self._open_secure_channel_exchange = struct_from_binary(ua.OpenSecureChannelResponse, msg.body())
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 593, in struct_from_binary
return _create_dataclass_deserializer(objtype)(data)
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 584, in decode
kwargs[field.name] = deserialize_field(data)
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 584, in decode
kwargs[field.name] = deserialize_field(data)
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 472, in extensionobject_from_binary
typeid = nodeid_from_binary(data)
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 388, in nodeid_from_binary
nidtype = ua.NodeIdType(encoding & 0b00111111)
File "/usr/local/lib/python3.8/enum.py", line 339, in __call__
return cls.__new__(cls, value)
File "/usr/local/lib/python3.8/enum.py", line 663, in __new__
raise ve_exc
ValueError: 63 is not a valid NodeIdType
2022-06-26 01:46:24.607: WARNING disconnect_socket was called but connection is closed
which I need to handle. So I modified the _process_received_data() to re-raise the exception. In the current implementation, that exception is catched inside the library and there is no way for me to handle it in my code.
I you have a better idea please let me know :)
Would it help if the library would introduce some boolean setup flag, which controls weather exceptions which are related to received data are reraised or not? User would have choice if they are willing to deal which such issues themselves or not.
I do not get what you are doing. _process received data is called in another thread. How do you manage to catch anything happening there?
@oroulet Thanks for your reply! I somehow hoped that the thread would recognize the exception and make the caller aware of this problem somehow.
It looks like your library freezes in this (very rare) scenario. Is it possible that the thread waits for something that never happens if this weird error occurs?
All I want is the ability to handle this special case somehow. Do you have any ideas?
that kind of things should not happen. Must be a bug in server or client parsing. To debug it we would need a way to reproduce it. Maybe if you would manage to record a failing session with wireshark it could help. We need to find out in what the server is sending and to know in what context...
Unfortunately I cannot reproduce this. It happened when one of our clients restarted its Kepware server which happens very rarely. This caused our microservice to freeze which is built on your library. Freezing is very bad behavior because someone need to restart the service manually. So what I'm tryig to achieve is to get at least an error of some kind because this would cause a restart of the service.
So yes, it it definitely a server issue. But it is a requirement to built our services robust enough so they can deal with even this bad scenarios appropriately.
What we do is poll server and restart client if any issue. This is in fact the defined way to do it by the specification. There are many other possible reasons for the client silently dying (like network errors) so polling is the only possible solution
I'm not sure if we are talking from the same thing. The error here occurs before I can connect my client to the server. So subscription or polling is not possible at that moment. My code is dying at await client.set_security_string() before it even reaches await client.connect(). So maybe this corner case is not correctly handled by your library?
then you will discover the issue at the call of connect or the first poll. Won't you? and the you just retry the entire connect session. Or do you mean that client just hangs without ever returning? In theory it should not happen, every wait for a socket operation is done through a wait_for with a timeout of a few seconds so you should get a timeout error What method is hanging? I think there are also some netwrok situations where a socket hangs, but I think we also handle all of them with some timeout on some async futures
From the stacktrace you posted it looks like the server is sending not a valid OpenSecureChannelResponse or some other response. Best thing to correct the error would be a wireshark trace of the reconnect or a log with debug level.
From the other logs I can tell that the server sends a lot of garbage at this time:
-
asyncua.ua.uaerrors._base.UaError: Unsupported message type b'\x00Qf' -
asyncua.ua.uaerrors._base.UaError: No request found for request id: 0, pending are dict_keys([1]), body was Buffer(size:56, data:b'w,\x8d(/\x00\x00\x00http://opcfoundation.org/UA/SecurityPolicy#None\xff')
What method is hanging? One of those:
await client.set_security_string(my_security_string)
await client.connect()
I see log statements before those but the log statements after those are no longer visible. Also also looks like the main thread is blocked synchronously because my other asyncio.Tasks do not respond either.
Also also looks like the main thread is blocked synchronously because my other asyncio.Tasks do not respond either.
OK that is bad ... and interesting... but we will need more info to understand what is happening. I cannot see where we could have hanging sync code but we never know
if you ctr-d when hanging we should be able to see where we are hanging
Also I have see that "No request found for request id: 0" and never understaood where it comes from. Do you have the entire log? or at least more than you showed over?
Here are more logs:
2022-06-26 01:30:03.824: INFO opc ua client: connect()
2022-06-26 03:30:03
2022-06-26 01:30:03.965: ERROR Exception raised while parsing message from server
2022-06-26 03:30:03
Traceback (most recent call last):
2022-06-26 03:30:03
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 167, in _call_callback
2022-06-26 03:30:03
self._callbackmap[request_id].set_result(body)
2022-06-26 03:30:03
KeyError: 0
2022-06-26 03:30:03
2022-06-26 03:30:03
The above exception was the direct cause of the following exception:
2022-06-26 03:30:03
2022-06-26 03:30:03
Traceback (most recent call last):
2022-06-26 03:30:03
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 79, in _process_received_data
2022-06-26 03:30:03
self._process_received_message(msg)
2022-06-26 03:30:03
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 98, in _process_received_message
2022-06-26 03:30:03
self._call_callback(msg.request_id(), msg.body())
2022-06-26 03:30:03
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 169, in _call_callback
2022-06-26 03:30:03
raise ua.UaError(f"No request found for request id: {request_id}, pending are {self._callbackmap.keys()}, body was {body}") from ex
2022-06-26 03:30:03
asyncua.ua.uaerrors._base.UaError: No request found for request id: 0, pending are dict_keys([1]), body was Buffer(size:56, data:b'\xd5$\xf7\x8b/\x00\x00\x00http://opcfoundation.org/UA/SecurityPolicy#None\xff')
2022-06-26 03:30:08
2022-06-26 01:30:08.960: WARNING disconnect_socket was called but connection is closed
2022-06-26 03:30:08
2022-06-26 01:30:08.961: ERROR Cannot connect to OPCUA server. Details:
2022-06-26 03:30:08
2022-06-26 01:30:08.961: WARNING OPC client cannot be recreated due to connection error. I'll try it again
2022-06-26 03:30:14
2022-06-26 01:30:13.961: INFO Reconnecting to OPC UA server...
2022-06-26 03:30:14
2022-06-26 01:30:13.962: INFO Close connection to OPC UA server.
2022-06-26 03:30:14
2022-06-26 01:30:13.962: WARNING close_session was called but connection is closed
2022-06-26 03:30:14
2022-06-26 01:30:13.962: WARNING close_secure_channel was called but connection is closed
2022-06-26 03:30:14
2022-06-26 01:30:13.962: WARNING disconnect_socket was called but connection is closed
2022-06-26 03:30:14
2022-06-26 01:30:13.962: INFO Disabled OPCUA library logs.
2022-06-26 03:30:14
2022-06-26 01:30:13.962: INFO opc ua client: connect()
2022-06-26 03:30:14
2022-06-26 01:30:14.068: ERROR Exception raised while parsing message from server
2022-06-26 03:30:14
Traceback (most recent call last):
2022-06-26 03:30:14
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 167, in _call_callback
2022-06-26 03:30:14
self._callbackmap[request_id].set_result(body)
2022-06-26 03:30:14
KeyError: 0
2022-06-26 03:30:14
2022-06-26 03:30:14
The above exception was the direct cause of the following exception:
2022-06-26 03:30:14
2022-06-26 03:30:14
Traceback (most recent call last):
2022-06-26 03:30:14
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 79, in _process_received_data
2022-06-26 03:30:14
self._process_received_message(msg)
2022-06-26 03:30:14
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 100, in _process_received_message
2022-06-26 03:30:14
self._call_callback(0, msg)
2022-06-26 03:30:14
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 169, in _call_callback
2022-06-26 03:30:14
raise ua.UaError(f"No request found for request id: {request_id}, pending are {self._callbackmap.keys()}, body was {body}") from ex
2022-06-26 03:30:14
asyncua.ua.uaerrors._base.UaError: No request found for request id: 0, pending are dict_keys([1]), body was Acknowledge(ProtocolVersion=0, ReceiveBufferSize=65536, SendBufferSize=65536, MaxMessageSize=16777216, MaxChunkCount=5000)
2022-06-26 03:30:19
2022-06-26 01:30:18.977: WARNING disconnect_socket was called but connection is closed
2022-06-26 03:30:19
2022-06-26 01:30:18.977: ERROR Cannot connect to OPCUA server. Details:
2022-06-26 03:30:19
2022-06-26 01:30:18.977: WARNING OPC client cannot be recreated due to connection error. I'll try it again
2022-06-26 03:30:24
2022-06-26 01:30:23.978: INFO Reconnecting to OPC UA server...
2022-06-26 03:30:24
2022-06-26 01:30:23.978: INFO Close connection to OPC UA server.
2022-06-26 03:30:24
2022-06-26 01:30:23.978: WARNING close_session was called but connection is closed
2022-06-26 03:30:24
2022-06-26 01:30:23.978: WARNING close_secure_channel was called but connection is closed
2022-06-26 03:30:24
2022-06-26 01:30:23.978: WARNING disconnect_socket was called but connection is closed
2022-06-26 03:30:24
2022-06-26 01:30:23.978: INFO Disabled OPCUA library logs.
2022-06-26 03:30:24
2022-06-26 01:30:23.978: INFO opc ua client: connect()
2022-06-26 03:30:24
2022-06-26 01:30:24.083: ERROR Exception raised while parsing message from server
2022-06-26 03:30:24
Traceback (most recent call last):
2022-06-26 03:30:24
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 167, in _call_callback
2022-06-26 03:30:24
self._callbackmap[request_id].set_result(body)
2022-06-26 03:30:24
KeyError: 0
2022-06-26 03:30:24
2022-06-26 03:30:24
The above exception was the direct cause of the following exception:
2022-06-26 03:30:24
2022-06-26 03:30:24
Traceback (most recent call last):
2022-06-26 03:30:24
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 79, in _process_received_data
2022-06-26 03:30:24
self._process_received_message(msg)
2022-06-26 03:30:24
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 98, in _process_received_message
2022-06-26 03:30:24
self._call_callback(msg.request_id(), msg.body())
2022-06-26 03:30:24
File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 169, in _call_callback
2022-06-26 03:30:24
raise ua.UaError(f"No request found for request id: {request_id}, pending are {self._callbackmap.keys()}, body was {body}") from ex
2022-06-26 03:30:24
asyncua.ua.uaerrors._base.UaError: No request found for request id: 0, pending are dict_keys([1]), body was Buffer(size:56, data:b'\xdf<\x93\xfd/\x00\x00\x00http://opcfoundation.org/UA/SecurityPolicy#None\xff')
2022-06-26 03:30:29
2022-06-26 01:30:29.055: WARNING disconnect_socket was called but connection
Can you test if #970 fixes your problem?
@schroeder- Thank you for this fix! Unfortunately I'm able to test this because I'm unable to reproduce the bug. So I hope for the best :grinning:
reopen if there is a need for it
@schroeder- Unfortunately, the bug still exists in asyncua 1.0.1. Should I create a new issue?
@oroulet What exactly was the reason why you think re-raising the exception is a bad idea?
because receiving happens in another thread. raisin an exception in that other thread, just hangs processing. If you want to reopen the issue you need a precise scenario . Ideally something we can reproduce, at worst a wireshark log and stack trace so we can find out what the server is sending us