opcua-asyncio icon indicating copy to clipboard operation
opcua-asyncio copied to clipboard

reraise exception in _process_received_data()

Open LostInDarkMath opened this issue 3 years ago • 14 comments

Hi there!

I had a problem like this:

2022-06-26 01:46:24.606: ERROR    Exception raised while parsing message from server
Traceback (most recent call last):
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 82, in _process_received_data
    self._open_secure_channel_exchange = struct_from_binary(ua.OpenSecureChannelResponse, msg.body())
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 593, in struct_from_binary
    return _create_dataclass_deserializer(objtype)(data)
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 584, in decode
    kwargs[field.name] = deserialize_field(data)
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 584, in decode
    kwargs[field.name] = deserialize_field(data)
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 472, in extensionobject_from_binary
    typeid = nodeid_from_binary(data)
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/ua/ua_binary.py", line 388, in nodeid_from_binary
    nidtype = ua.NodeIdType(encoding & 0b00111111)
  File "/usr/local/lib/python3.8/enum.py", line 339, in __call__
    return cls.__new__(cls, value)
  File "/usr/local/lib/python3.8/enum.py", line 663, in __new__
    raise ve_exc
ValueError: 63 is not a valid NodeIdType
2022-06-26 01:46:24.607: WARNING  disconnect_socket was called but connection is closed

which I need to handle. So I modified the _process_received_data() to re-raise the exception. In the current implementation, that exception is catched inside the library and there is no way for me to handle it in my code.

I you have a better idea please let me know :)

LostInDarkMath avatar Jun 28 '22 08:06 LostInDarkMath

Would it help if the library would introduce some boolean setup flag, which controls weather exceptions which are related to received data are reraised or not? User would have choice if they are willing to deal which such issues themselves or not.

alpex8 avatar Jul 01 '22 10:07 alpex8

I do not get what you are doing. _process received data is called in another thread. How do you manage to catch anything happening there?

oroulet avatar Jul 10 '22 18:07 oroulet

@oroulet Thanks for your reply! I somehow hoped that the thread would recognize the exception and make the caller aware of this problem somehow.

It looks like your library freezes in this (very rare) scenario. Is it possible that the thread waits for something that never happens if this weird error occurs?

All I want is the ability to handle this special case somehow. Do you have any ideas?

LostInDarkMath avatar Jul 10 '22 18:07 LostInDarkMath

that kind of things should not happen. Must be a bug in server or client parsing. To debug it we would need a way to reproduce it. Maybe if you would manage to record a failing session with wireshark it could help. We need to find out in what the server is sending and to know in what context...

oroulet avatar Jul 10 '22 18:07 oroulet

Unfortunately I cannot reproduce this. It happened when one of our clients restarted its Kepware server which happens very rarely. This caused our microservice to freeze which is built on your library. Freezing is very bad behavior because someone need to restart the service manually. So what I'm tryig to achieve is to get at least an error of some kind because this would cause a restart of the service.

So yes, it it definitely a server issue. But it is a requirement to built our services robust enough so they can deal with even this bad scenarios appropriately.

LostInDarkMath avatar Jul 10 '22 18:07 LostInDarkMath

What we do is poll server and restart client if any issue. This is in fact the defined way to do it by the specification. There are many other possible reasons for the client silently dying (like network errors) so polling is the only possible solution

oroulet avatar Jul 10 '22 19:07 oroulet

I'm not sure if we are talking from the same thing. The error here occurs before I can connect my client to the server. So subscription or polling is not possible at that moment. My code is dying at await client.set_security_string() before it even reaches await client.connect(). So maybe this corner case is not correctly handled by your library?

LostInDarkMath avatar Jul 10 '22 19:07 LostInDarkMath

then you will discover the issue at the call of connect or the first poll. Won't you? and the you just retry the entire connect session. Or do you mean that client just hangs without ever returning? In theory it should not happen, every wait for a socket operation is done through a wait_for with a timeout of a few seconds so you should get a timeout error What method is hanging? I think there are also some netwrok situations where a socket hangs, but I think we also handle all of them with some timeout on some async futures

oroulet avatar Jul 11 '22 06:07 oroulet

From the stacktrace you posted it looks like the server is sending not a valid OpenSecureChannelResponse or some other response. Best thing to correct the error would be a wireshark trace of the reconnect or a log with debug level.

schroeder- avatar Jul 11 '22 06:07 schroeder-

From the other logs I can tell that the server sends a lot of garbage at this time:

  • asyncua.ua.uaerrors._base.UaError: Unsupported message type b'\x00Qf'
  • asyncua.ua.uaerrors._base.UaError: No request found for request id: 0, pending are dict_keys([1]), body was Buffer(size:56, data:b'w,\x8d(/\x00\x00\x00http://opcfoundation.org/UA/SecurityPolicy#None\xff')

What method is hanging? One of those:

await client.set_security_string(my_security_string)
await client.connect()

I see log statements before those but the log statements after those are no longer visible. Also also looks like the main thread is blocked synchronously because my other asyncio.Tasks do not respond either.

LostInDarkMath avatar Jul 11 '22 06:07 LostInDarkMath

Also also looks like the main thread is blocked synchronously because my other asyncio.Tasks do not respond either.

OK that is bad ... and interesting... but we will need more info to understand what is happening. I cannot see where we could have hanging sync code but we never know

if you ctr-d when hanging we should be able to see where we are hanging

Also I have see that "No request found for request id: 0" and never understaood where it comes from. Do you have the entire log? or at least more than you showed over?

oroulet avatar Jul 11 '22 07:07 oroulet

Here are more logs:

2022-06-26 01:30:03.824: INFO     opc ua client: connect()
	
2022-06-26 03:30:03	
2022-06-26 01:30:03.965: ERROR    Exception raised while parsing message from server
	
2022-06-26 03:30:03	
Traceback (most recent call last):
	
2022-06-26 03:30:03	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 167, in _call_callback
	
2022-06-26 03:30:03	
    self._callbackmap[request_id].set_result(body)
	
2022-06-26 03:30:03	
KeyError: 0
	
2022-06-26 03:30:03	

	
2022-06-26 03:30:03	
The above exception was the direct cause of the following exception:
	
2022-06-26 03:30:03	

	
2022-06-26 03:30:03	
Traceback (most recent call last):
	
2022-06-26 03:30:03	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 79, in _process_received_data
	
2022-06-26 03:30:03	
    self._process_received_message(msg)
	
2022-06-26 03:30:03	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 98, in _process_received_message
	
2022-06-26 03:30:03	
    self._call_callback(msg.request_id(), msg.body())
	
2022-06-26 03:30:03	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 169, in _call_callback
	
2022-06-26 03:30:03	
    raise ua.UaError(f"No request found for request id: {request_id}, pending are {self._callbackmap.keys()}, body was {body}") from ex
	
2022-06-26 03:30:03	
asyncua.ua.uaerrors._base.UaError: No request found for request id: 0, pending are dict_keys([1]), body was Buffer(size:56, data:b'\xd5$\xf7\x8b/\x00\x00\x00http://opcfoundation.org/UA/SecurityPolicy#None\xff')
	
2022-06-26 03:30:08	
2022-06-26 01:30:08.960: WARNING  disconnect_socket was called but connection is closed
	
2022-06-26 03:30:08	
2022-06-26 01:30:08.961: ERROR    Cannot connect to OPCUA server. Details: 
	
2022-06-26 03:30:08	
2022-06-26 01:30:08.961: WARNING  OPC client cannot be recreated due to connection error. I'll try it again
	
2022-06-26 03:30:14	
2022-06-26 01:30:13.961: INFO     Reconnecting to OPC UA server...
	
2022-06-26 03:30:14	
2022-06-26 01:30:13.962: INFO     Close connection to OPC UA server.
	
2022-06-26 03:30:14	
2022-06-26 01:30:13.962: WARNING  close_session was called but connection is closed
	
2022-06-26 03:30:14	
2022-06-26 01:30:13.962: WARNING  close_secure_channel was called but connection is closed
	
2022-06-26 03:30:14	
2022-06-26 01:30:13.962: WARNING  disconnect_socket was called but connection is closed
	
2022-06-26 03:30:14	
2022-06-26 01:30:13.962: INFO     Disabled OPCUA library logs.
	
2022-06-26 03:30:14	
2022-06-26 01:30:13.962: INFO     opc ua client: connect()
	
2022-06-26 03:30:14	
2022-06-26 01:30:14.068: ERROR    Exception raised while parsing message from server
	
2022-06-26 03:30:14	
Traceback (most recent call last):
	
2022-06-26 03:30:14	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 167, in _call_callback
	
2022-06-26 03:30:14	
    self._callbackmap[request_id].set_result(body)
	
2022-06-26 03:30:14	
KeyError: 0
	
2022-06-26 03:30:14	

	
2022-06-26 03:30:14	
The above exception was the direct cause of the following exception:
	
2022-06-26 03:30:14	

	
2022-06-26 03:30:14	
Traceback (most recent call last):
	
2022-06-26 03:30:14	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 79, in _process_received_data
	
2022-06-26 03:30:14	
    self._process_received_message(msg)
	
2022-06-26 03:30:14	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 100, in _process_received_message
	
2022-06-26 03:30:14	
    self._call_callback(0, msg)
	
2022-06-26 03:30:14	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 169, in _call_callback
	
2022-06-26 03:30:14	
    raise ua.UaError(f"No request found for request id: {request_id}, pending are {self._callbackmap.keys()}, body was {body}") from ex
	
2022-06-26 03:30:14	
asyncua.ua.uaerrors._base.UaError: No request found for request id: 0, pending are dict_keys([1]), body was Acknowledge(ProtocolVersion=0, ReceiveBufferSize=65536, SendBufferSize=65536, MaxMessageSize=16777216, MaxChunkCount=5000)
	
2022-06-26 03:30:19	
2022-06-26 01:30:18.977: WARNING  disconnect_socket was called but connection is closed
	
2022-06-26 03:30:19	
2022-06-26 01:30:18.977: ERROR    Cannot connect to OPCUA server. Details: 
	
2022-06-26 03:30:19	
2022-06-26 01:30:18.977: WARNING  OPC client cannot be recreated due to connection error. I'll try it again
	
2022-06-26 03:30:24	
2022-06-26 01:30:23.978: INFO     Reconnecting to OPC UA server...
	
2022-06-26 03:30:24	
2022-06-26 01:30:23.978: INFO     Close connection to OPC UA server.
	
2022-06-26 03:30:24	
2022-06-26 01:30:23.978: WARNING  close_session was called but connection is closed
	
2022-06-26 03:30:24	
2022-06-26 01:30:23.978: WARNING  close_secure_channel was called but connection is closed
	
2022-06-26 03:30:24	
2022-06-26 01:30:23.978: WARNING  disconnect_socket was called but connection is closed
	
2022-06-26 03:30:24	
2022-06-26 01:30:23.978: INFO     Disabled OPCUA library logs.
	
2022-06-26 03:30:24	
2022-06-26 01:30:23.978: INFO     opc ua client: connect()
	
2022-06-26 03:30:24	
2022-06-26 01:30:24.083: ERROR    Exception raised while parsing message from server
	
2022-06-26 03:30:24	
Traceback (most recent call last):
	
2022-06-26 03:30:24	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 167, in _call_callback
	
2022-06-26 03:30:24	
    self._callbackmap[request_id].set_result(body)
	
2022-06-26 03:30:24	
KeyError: 0
	
2022-06-26 03:30:24	

	
2022-06-26 03:30:24	
The above exception was the direct cause of the following exception:
	
2022-06-26 03:30:24	

	
2022-06-26 03:30:24	
Traceback (most recent call last):
	
2022-06-26 03:30:24	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 79, in _process_received_data
	
2022-06-26 03:30:24	
    self._process_received_message(msg)
	
2022-06-26 03:30:24	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 98, in _process_received_message
	
2022-06-26 03:30:24	
    self._call_callback(msg.request_id(), msg.body())
	
2022-06-26 03:30:24	
  File "/opt/maddox/.venv/lib/python3.8/site-packages/asyncua/client/ua_client.py", line 169, in _call_callback
	
2022-06-26 03:30:24	
    raise ua.UaError(f"No request found for request id: {request_id}, pending are {self._callbackmap.keys()}, body was {body}") from ex
	
2022-06-26 03:30:24	
asyncua.ua.uaerrors._base.UaError: No request found for request id: 0, pending are dict_keys([1]), body was Buffer(size:56, data:b'\xdf<\x93\xfd/\x00\x00\x00http://opcfoundation.org/UA/SecurityPolicy#None\xff')
	
2022-06-26 03:30:29	
2022-06-26 01:30:29.055: WARNING  disconnect_socket was called but connection 

LostInDarkMath avatar Jul 11 '22 10:07 LostInDarkMath

Can you test if #970 fixes your problem?

schroeder- avatar Jul 17 '22 09:07 schroeder-

@schroeder- Thank you for this fix! Unfortunately I'm able to test this because I'm unable to reproduce the bug. So I hope for the best :grinning:

LostInDarkMath avatar Jul 17 '22 10:07 LostInDarkMath

reopen if there is a need for it

oroulet avatar Sep 16 '22 11:09 oroulet

@schroeder- Unfortunately, the bug still exists in asyncua 1.0.1. Should I create a new issue?

@oroulet What exactly was the reason why you think re-raising the exception is a bad idea?

LostInDarkMath avatar Feb 13 '23 15:02 LostInDarkMath

because receiving happens in another thread. raisin an exception in that other thread, just hangs processing. If you want to reopen the issue you need a precise scenario . Ideally something we can reproduce, at worst a wireshark log and stack trace so we can find out what the server is sending us

oroulet avatar Feb 13 '23 17:02 oroulet