python-opcua
python-opcua copied to clipboard
Receiving subscription notification although value did not change
This issue is related to the version in use: 0.98.7
1. Usage description.
I am using this OPC-UA library in a productive environment where the production on multiple PLCs is monitored. One OPC-UA server provides nodes of a various number of operation points. Each operation point has a numeric status node which changes depending on its working state (1 idle, 10 working, 3 and 4 malfunction, etc.)
There are 4 OPC-UA servers on different PLCs with different IPs.
I have a total of 8 Python processes running. Source code of all processes is identical. Each process creats a subscription (configured in a JSON file and passed as argument to the process) on a various number of status nodes of the operation points.
Proc. 1 - Connects to PLC1 - 32 Subscriptions Proc. 2 - Connects to PLC1 - 1 Subscriptions Proc. 3 - Connects to PLC1 - 7 Subscriptions Proc. 4 - Connects to PLC2 - 26 Subscriptions Proc. 5 - Connects to PLC3 - 61 Subscriptions Proc. 6 - Connects to PLC3 - 1 Subscriptions Proc. 7 - Connects to PLC3 - 5 Subscriptions Proc. 8 - Connects to PLC4 - 22 Subscriptions
Each data change notification is processed in a thread that is created by me. This thread reads some additional information from the PLC (multiple values in nodes on the OPC-UA server) which then creates an entry in a python Queue. This queue is processed by a single dedicated thread which writes data into the database.
All 8 processes are started (or stopped) through a shell script one after another in a very short interval. This seems to be an important detail for my issue.
2. Unpredicted behavior
Besides the Python processes I have written this simple script which creates a single subscription to a status node on for example PLC1. Each time the status changes it displays the new status with a time stamp.
class TestSubHandler():
def datachange_notification(self, node, value, data):
print(datetime.now(), "Data change", value)
client = Client("opc.tcp://0.0.0.0:4840")
client.connect()
node_path = ["Path", "To", "Node"]
sub_node = client.get_root_node().get_child(node_path)
sub = client.create_subscription(500, TestSubHandler())
handle = sub.subscribe_data_change(sub_node)
try:
while True:
time.sleep(10)
except KeyboardInterrupt:
client.disconnect()
If working correctly the output looks usually like this (exemplary):
2020-02-11 15:11:27.ms Data change 1
2020-02-11 15:11:27.ms Data change 10
2020-02-11 15:11:27.ms Data change 1
2020-02-11 15:11:28.ms Data change 10
2020-02-11 15:11:28.ms Data change 1
2020-02-11 15:11:29.ms Data change 10
2020-02-11 15:11:29.ms Data change 1
2020-02-11 15:11:30.ms Data change 10
2020-02-11 15:11:30.ms Data change 1
2020-02-11 15:11:31.ms Data change 10
2020-02-11 15:11:31.ms Data change 1
While this script is running I start all my 8 Python processes through the script and suddenly the output becomes
2020-02-11 15:11:27.ms Data change 1
2020-02-11 15:11:27.ms Data change 10
2020-02-11 15:11:28.ms Data change 1
2020-02-11 15:11:28.ms Data change 10 # Processes are started after this one!
2020-02-11 15:11:28.ms Data change 1
2020-02-11 15:11:29.ms Data change 1
2020-02-11 15:11:29.ms Data change 1
2020-02-11 15:11:30.ms Data change 1
2020-02-11 15:11:30.ms Data change 10
2020-02-11 15:11:30.ms Data change 10
2020-02-11 15:11:31.ms Data change 10
2020-02-11 15:11:31.ms Data change 10
2020-02-11 15:11:31.ms Data change 10
2020-02-11 15:11:32.ms Data change 10
2020-02-11 15:11:33.ms Data change 1
2020-02-11 15:11:33.ms Data change 1
2020-02-11 15:11:33.ms Data change 1
2020-02-11 15:11:34.ms Data change 10
2020-02-11 15:11:34.ms Data change 10
2020-02-11 15:11:34.ms Data change 10
2020-02-11 15:11:35.ms Data change 10
2020-02-11 15:11:35.ms Data change 10
2020-02-11 15:11:35.ms Data change 1
2020-02-11 15:11:36.ms Data change 1
2020-02-11 15:11:36.ms Data change 1
2020-02-11 15:11:36.ms Data change 10
2020-02-11 15:11:37.ms Data change 10
2020-02-11 15:11:37.ms Data change 1
So for some reason although the value of the node has not changed the server is sending me a notification that it did.
Even stranger is the fact that the false notificatons stop again after I stop my Python processes.
2020-02-11 15:11:27.ms Data change 1
2020-02-11 15:11:27.ms Data change 10
2020-02-11 15:11:28.ms Data change 1
2020-02-11 15:11:28.ms Data change 10 # Processes are started after this one!
2020-02-11 15:11:28.ms Data change 1
2020-02-11 15:11:29.ms Data change 1
2020-02-11 15:11:29.ms Data change 1
2020-02-11 15:11:30.ms Data change 1
2020-02-11 15:11:30.ms Data change 10
2020-02-11 15:11:30.ms Data change 10
2020-02-11 15:11:31.ms Data change 10
2020-02-11 15:11:31.ms Data change 10
2020-02-11 15:11:31.ms Data change 10
2020-02-11 15:11:32.ms Data change 10
2020-02-11 15:11:33.ms Data change 1
2020-02-11 15:11:33.ms Data change 1
2020-02-11 15:11:33.ms Data change 1
2020-02-11 15:11:34.ms Data change 10
2020-02-11 15:11:34.ms Data change 10
2020-02-11 15:11:34.ms Data change 10
2020-02-11 15:11:35.ms Data change 10 # Processes are stopped here!
2020-02-11 15:11:35.ms Data change 1
2020-02-11 15:11:35.ms Data change 10
2020-02-11 15:11:36.ms Data change 1
2020-02-11 15:11:37.ms Data change 10
2020-02-11 15:11:38.ms Data change 1
If I start the processes with a sleep in between of 10 seconds (lower did not suffice) the faulty behaviour does not occur.
Why does the time the processes are started make a difference? Is it a problem with the PLC? Is it a problem with the opcua library? Did somebody ever see something like this before?
Read about how python multiprocessing really works. If you are handing UA functions (like subscriptions) off to another process weird things can happen because multiprocessing library has an entire copy of the main processes memory. When a data change happens in a process it's function is getting called from all the processes that have a reference to it or something.
Another person tried to call UA functions from processes and also got really strange results. https://github.com/FreeOpcUa/python-opcua/issues/883
So you mean your simple client get extra (wrong) notifications when the server gets new connections from these 8 ua clients?
If you're simple clients gets these extra notifications then it sounds like an issue with your server. But it is very hard to understand your setup. You have a lot of extra not relevant information like this json config stuff
Read about how python multiprocessing really works. If you are handing UA functions (like subscriptions) off to another process weird things can happen because multiprocessing library has an entire copy of the main processes memory. When a data change happens in a process it's function is getting called from all the processes that have a reference to it or something.
I am not using multiprocessing, only threading. And as far as I can tell freeopcua also does nowhere use multiprocessing.
So you mean your simple client get extra (wrong) notifications when the server gets new connections from these 8 ua clients?
Correct.
If you're simple clients gets these extra notifications then it sounds like an issue with your server. But it is very hard to understand your setup. You have a lot of extra not relevant information like this json config stuff
The JSON is just configuration. It only contains paths to the nodes that I want the process to subscribe to. There is not much else to it.
So you mean your simple client get extra (wrong) notifications when the server gets new connections from these 8 ua clients?
Correct.
Then you have a server issue. Unless your clients do something strange like wttting things
Are you monitoring 32 tags with 32 individual subscriptions? Or are you using 1 subscription to monitor 32 items?
Can you check if all the subscription handles are unique? I don't have time to look in the spec, but I think every subscription gets some kind of identifier. Maybe when you subscribe fast the server is messing something up.
I am creation 32 individual subscriptions for 32 tags.
I will check if I can get the identifier for each subscription and check whether there are duplicates. Thanks for the idea.
I've taken a look at the subscription IDs that I receive from the server and there are no duplicates.
I have another idea but am note sure. This comment by one of the support team states that if a connection is lost it is kept open for a defined time. If a client reconnects within that time the session is reused. What about the subscriptions created in that session? Are they also reused? I often see this behaviour after a broken pipe exception after which I reconnect to the PLC and recreate subscriptions. What if I reconnect within that time frame and the subscriptions are still active but I create new subscription in addition to those that are still active.
But this would not explain why my simple script receives more notifications without value change. It did not reconnect and recreate subscriptions.
Still looking for the cause and a soltuion for this problem I wanted to verify that this may indeed be a server issue.
So I looked for another client implementation of OPC-UA. I have found this JavaScript implementation. Using the example I recreated my Python script for the JavaScript module. It does exactly the same. It connets to the same Server and subscribed to the same node for value changes. While both were running I again started all processes on the productive server.
This is the result:
Nevermind the timestamp on the left. For some reason the time is off by one hour.
As you can see the JavaScript OPC-UA client does not suffer from the same issue. Using UaExpert I also cannot reproduce the behavior that I get with freeopcua.
@ap-0 we see your screenshots, but for us to to understand you need to write some minimal test code we can run to reproduce the error.
@oroulet sorry that it took a while to create a minimal example. My time testing on the productive server is limited and at first I was not able to reproduce the issue with the example. And currently I am still not able to reproduce it 100 %. I am not sure why.
With my processes running the notifications for status changes are permanent and only stop when most of the processes that all connect to the same PLC are stopped. With my minimal example I can only see the strange behavior for a couple of seconds and then it stops. Afterwards it works as intended. When starting two to three instances of my minimal example I receive 3 to 4 notifications with identital values.
Also: I am not able to reproduce this issue with the PLC I have here for testing: Siemens SIMATIC S7-1500 OPC UA, V02.01.00. While the PLC in the productive environment Siemens SIMATIC S7-1500 OPC- UA V02.05.00.
Also also: I do get some exceptions while creating subscriptions but not on all nodes.
Traceback (most recent call last):
File "/opt/python34/lib/python3.4/concurrent/futures/_base.py", line 297, in _invoke_callbacks
callback(self)
File "/opt/python34/lib/python3.4/site-packages/opcua/client/ua_client.py", line 469, in _call_publish_callback
self._uasocket.check_answer(data, "while waiting for publish response")
File "/opt/python34/lib/python3.4/site-packages/opcua/client/ua_client.py", line 88, in check_answer
hdr.ServiceResult.check()
File "/opt/python34/lib/python3.4/site-packages/opcua/ua/uatypes.py", line 233, in check
raise UaStatusCodeError(self.value
This is the code of the simple script of which I have posted multiple examples above:
import sys
import time
from opcua import Client
from opcua.common.subscription import Subscription
from opcua import ua
from datetime import datetime
import opcua as opc
class SubHandler():
def datachange_notification(self, node, val, data):
print(datetime.now(), "- Data Change - ", val)
def event_notification(self, event):
pass
server = "opc.tcp://<ip>:4840"
node = ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S050", "3:Status", "3:Status"]
client = Client(server)
print("Connecting")
client.connect()
print("Done")
root = client.get_root_node()
n = root.get_child(node)
handler = SubHandler()
sub = client.create_subscription(500, handler)
handle = sub.subscribe_data_change(n)
sub.subscribe_events()
while True:
try:
time.sleep(10)
except KeyboardInterrupt:
print("CTRL+C")
break
print("Disconnecting")
client.disconnect()
print("Done")
This is the minimal example of my processes that I have broken down to the essentials as much as possible.
import sys
import json
import threading
import time
from datetime import datetime
import opcua as opc
json_file_content = """
{
"opc_ua_servers": [
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S056", "3:Status", "3:Status"]}
]
},
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S007", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S008", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S009", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S010", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S015", "3:Status", "3:Status"]}
]
},
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S020", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S025", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S030", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S035", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S040", "3:Status", "3:Status"]}
]
},
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S042", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S050", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S055", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S058", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S060", "3:Status", "3:Status"]}
]
},
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S061", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S062", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S064", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S065", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S070", "3:Status", "3:Status"]}
]
},
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S072", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S073", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S075", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S080", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S100", "3:Status", "3:Status"]}
]
},
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S105", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S110", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S112", "3:Status", "3:Status"]}
]
},
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S115", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S115_CONV", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S116", "3:Status", "3:Status"]}
]
},
{
"end_point_url": "opc.tcp://ip_here:4840",
"ops": [
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S043", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S044A", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S044B", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S045A", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S045B", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S047LH", "3:Status", "3:Status"]},
{"node_path" : ["0:Objects", "3:Path", "3:To", "3:Node", "3:LK_S047RH", "3:Status", "3:Status"]}
]
}
]
}
"""
class SubHandler():
def __init__(self, plc_cnxn, plc_lock):
self.plc_cnxn = plc_cnxn
self.plc_lock = plc_lock
def datachange_notification(self, node, value, data):
def do_stuff(value, plc_cnxn, plc_lock):
"""Here I would read more data from the plc.
Access since multiple threads share one client connection
I lock it using threading.Lock() before reading and release
again after reading is done.
For the examples sake I just read some server diagnostics.
"""
with plc_lock:
print(datetime.now(), value)
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_ServerViewCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_CurrentSessionCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_CumulatedSessionCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_SecurityRejectedSessionCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_RejectedSessionCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_SessionTimeoutCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_SessionAbortCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_PublishingIntervalCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_CurrentSubscriptionCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_CumulatedSubscriptionCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_SecurityRejectedRequestsCount).get_value()
plc_cnxn.get_node(opc.ua.ObjectIds.ServerDiagnosticsSummaryType_RejectedRequestsCount).get_value()
threading.Thread(target=do_stuff, args=[value, self.plc_cnxn, self.plc_lock], daemon=True).start()
class OpcUaClient(threading.Thread):
def __init__(self, url, ops, event_stop):
super().__init__()
self.setDaemon(True)
self.event_stop = event_stop
self.ops = [dict(op) for op in ops]
self.end_point_url = url
self.client = None
self.lock = threading.Lock()
@property
def is_connected(self):
return self.client.get_node(opc.ua.ObjectIds.Server_ServerStatus_State).get_value() == 0
def run(self):
while True:
if self.event_stop.is_set():
self.client.disconnect()
return
try:
self.client = opc.Client(self.end_point_url)
self.client.connect()
except Exception as ex:
print("error connecting", ex)
time.sleep(10)
continue
else:
root_node = self.client.get_root_node()
for op in self.ops:
sub_node = root_node.get_child(op["node_path"])
sub = self.client.create_subscription(500, SubHandler(self.client, self.lock))
sub.subscribe_data_change(sub_node)
try:
while True:
time.sleep(1)
if self.event_stop.is_set():
self.client.disconnect()
return
if not self.is_connected:
print("lost connection")
self.client.disconnect()
break
except Exception as ex:
print("Lost connection", ex)
self.client.disconnect()
if __name__ == "__main__":
servers = json.loads(json_file_content)
clients = []
event_stop = threading.Event()
for server in servers["opc_ua_servers"]:
url = server["end_point_url"]
ops = server["ops"]
clients.append(OpcUaClient(url, ops, event_stop))
for client in clients:
client.start()
try:
while True:
time.sleep(1)
print("still alive...")
except KeyboardInterrupt:
event_stop.set()
for client in clients:
client.join()
-
As mentioned in the beginning I get the wrong behavior only for a couple of seconds. I receive data change notifications 10, 10, 10, 10 and then it continues to work correctly.
-
I have start multiple instances of the script (2, 3 or more) for the issue to come up. If starting only one the process does not show any signs of problems.
-
All instances have to be started at the same time. If started with a sleep in between the issue does not come up.
Below the output of the script in the first code block.
2020-02-24 04:02:33.966174 - Data Change - 1
2020-02-24 04:02:43.465315 - Data Change - 10
2020-02-24 04:03:39.472530 - Data Change - 1
2020-02-24 04:03:56.288786 - Data Change - 1
2020-02-24 04:03:57.298652 - Data Change - 1
2020-02-24 04:03:58.829144 - Data Change - 1
2020-02-24 04:04:05.555810 - Data Change - 1
2020-02-24 04:05:37.074840 - Data Change - 10
2020-02-24 04:06:28.585540 - Data Change - 1
2020-02-24 04:06:39.586266 - Data Change - 10
2020-02-24 04:07:46.100337 - Data Change - 1
2020-02-24 04:07:55.603414 - Data Change - 10
2020-02-24 04:08:44.113602 - Data Change - 1
2020-02-24 04:08:53.115656 - Data Change - 10
I think I can say with high confidence that it is a PLC issue. As mentioned I have a testing PLC (Siemens V02.01.00) where I am not able to reproduce the issue. The produciton PLC (V02.05.00) has this issue. I have updated the testing PLC to V02.05.00 and am now able to reproduce the problem. Later today I will update to something newer than V02.05.00 and test again.
Still trange is the fact that the OPC UA client in JavaScript did not behave the same which made me think that this is not PLC related.
Thanks for the update. That make sense
Hi @ap-0, sorry to jump on this thread, did you determine the cause of your issue? We are facing a similar problem with a Siemens S7-1500 (6ES7 510-1DJ01-0AB0 FW V2.6) OPC-UA Server where if we subscribe to too many nodes we get the error below, which looks exactly like you described.
We think it is the normal behaviour of the OPC-UA server that it publishes all of the subscribed nodes immediately on subscription that is causing the problem, as there seems to be some sort of buffer overflow somewhere when we have too many responses to handle in a short space of time.
Did you find that a certain firmware version fixed this for you? If so, what difference in behaviour was observed, I would be interested to know if eg. FW v2.01 did not publish datachange notifications on subscription? It looks like there were some OPC changes in FW v2.5 for this CPU as there are some notes about this in the hardware catalog, so it's possible this may have been affected.
Grateful for any insight, thanks.
maxNotificationsPerPublish!
The maximum number of notifications that the Client wishes to receive in a single Publish response. A value of zero indicates that there is no limit. The number of notifications per Publish is the sum of monitoredItems in the DataChangeNotification and events in the EventNotificationList.
Hi @mjk-automation-ltd, no - unfortnuately the prcoesses still behave the same. I still get continues notifications although the value of the tag did not change. At first we thought we fixed the issue by updating the PLC firmware (to version 2.8 if I'm not wrong). And for a couple of days everything looked fine. The issue was not reproducable at first. But after some days it started again to send notifications.