opcua icon indicating copy to clipboard operation
opcua copied to clipboard

Possible flaw in autoreconnect logic

Open R290 opened this issue 2 years ago • 4 comments

Related to https://github.com/influxdata/telegraf/issues/11559 and possibly related to https://github.com/gopcua/opcua/issues/597 and https://github.com/gopcua/opcua/issues/434.

The errors below are generated using the slightly modified read example here https://github.com/R290/opcua/tree/autoreconnect/examples/read with the docker-compose in the same folder.

The output with debugging disabled:

Starting acquisition
time.Date(2022, time.August, 13, 8, 29, 21, 403337000, time.UTC)
Starting acquisition
time.Date(2022, time.August, 13, 8, 29, 31, 403580000, time.UTC)

When restarting the opcua container:

Starting acquisition
time.Date(2022, time.August, 13, 8, 30, 41, 724244000, time.UTC)
Starting acquisition
Read failed: The operation could not complete because the client is not connected to the server. StatusBadServerNotConnected (0x800D0000)
Starting acquisition
Read failed: The session id is not valid. StatusBadSessionIDInvalid (0x80250000)
Starting acquisition
Read failed: The session id is not valid. StatusBadSessionIDInvalid (0x80250000)
Starting acquisition
Read failed: The session id is not valid. StatusBadSessionIDInvalid (0x80250000)
Starting acquisition
Read failed: The session id is not valid. StatusBadSessionIDInvalid (0x80250000)
Starting acquisition
time.Date(2022, time.August, 13, 8, 31, 41, 715146000, time.UTC)
Starting acquisition
Read failed: The operation could not complete because the client is not connected to the server. StatusBadServerNotConnected (0x800D0000)

It does not recover cleanly.

Normal behavior with debugging enabled:

Starting acquisition
debug: uasc 1/9: send *ua.ReadRequest with 108 bytes
debug: uacp 1: recv MSGF with 86 bytes
debug: uasc 1/9: recv MSGF with 86 bytes
debug: uasc 1/9: recv *ua.ReadResponse
debug: uasc 1/9: sending *ua.ReadResponse to handler
time.Date(2022, time.August, 13, 8, 33, 40, 166392000, time.UTC)

When restarting the opcua container the debug console is flooded witht the following repeated message patten:

debug: client: monitor: disconnected
debug: client: monitor: auto-reconnecting
debug: client: monitor: action: createSecureChannel
debug: uacp 541: close
debug: publish: PublishResponse: {"ResponseHeader":{"Timestamp":"2022-08-13T08:33:58.30056Z","RequestHandle":9,"ServiceResult":2155413504,"ServiceDiagnostics":{"EncodingMask":0,"SymbolicID":0,"NamespaceURI":0,"Locale":0,"LocalizedText":0,"AdditionalInfo":"","InnerStatusCode":0,"InnerDiagnosticInfo":null},"StringTable":null,"AdditionalHeader":{"EncodingMask":0,"TypeID":{"NodeID":"i=0","NamespaceURI":"","ServerIndex":0},"Value":null}},"SubscriptionID":0,"AvailableSequenceNumbers":null,"MoreNotifications":false,"NotificationMessage":{"SequenceNumber":0,"PublishTime":"0001-01-01T00:00:00Z","NotificationData":null},"Results":null,"DiagnosticInfos":null}
debug: publish: error: no subscriptions but the publishing loop is still running: There is no subscription available for this session. StatusBadNoSubscription (0x80790000)
debug: sub: error: There is no subscription available for this session. StatusBadNoSubscription (0x80790000)
debug: sub: pause
debug: sub: pause: pause
debug: uasc 541: readChunk EOF
debug: uasc 541: Close()

Via Wireshark at localhost:3000 we can also see that the server is being swamped with packages (using filter tcp.port == 4840). The reason Nginx is present is so we can simulate a dropped connection as well by restarting the nginx container. Behavior is similar to restarting the opcua server itself.

To me it seems that flooding the server with requests on a dropped connection is definitely not as designed...

R290 avatar Aug 13 '22 08:08 R290

same issue here. OPCUA Server gets flooded with wrong session ids.

Scaredon avatar Nov 13 '23 09:11 Scaredon

Just checked with v0.5.2 and the flooding of the server no longer occurs. @Scaredon which version are you using? The read example at https://github.com/gopcua/opcua/blob/main/examples/read/read.go has also been modified to make a distinction between different error types.

R290 avatar Nov 19 '23 20:11 R290

@R290 I've retested today. newest telegraf plugin with v0.5.2 goopcua. Local Build. Autoreconnect is set to true.

  1. start opcuaserver
  2. start telegraf
  3. stop opcuaserver
  4. start opcuaserver

The server is spamed with the sequence (not sure if this is the correct order, log frequency is pretty high):

  • Got an authentication request from the application 'Telegraf' with the TokenType 'ANONYMOUS_0'
  • accept Anonymous authentication request
  • reported: Failed to get session: id not found 404076375

Scaredon avatar Nov 27 '23 20:11 Scaredon

Sadly, I'm unable to reproduce this. Could you provide any information about the server and authentication method you are using?

R290 avatar Dec 16 '23 18:12 R290