opcua icon indicating copy to clipboard operation
opcua copied to clipboard

Server: PublishResponse sent back on the wrong connection

Open bs1959 opened this issue 1 year ago • 9 comments

In spawn_subscriptions_task of TcpTransport, all sessions of the server are iterated, it should only process sessions attached to this secure channel (TcpTransport). This is a major problem, because it ends up with PublishResponse send back to the wrong channel (which has not requested a publish request) and the channel which has sent a PubglishRequest receives nothing.

bs1959 avatar Jul 11 '24 13:07 bs1959

This is linked to issue #285 and can explain #326

bs1959 avatar Jul 11 '24 13:07 bs1959

I'm currently working on a rewrite of the server as well, like I did with the client, though there is a fair bit left until it's comparable to the current server implementation. It has a completely different approach to dealing with connections and subscriptions, and will definitely fix this.

einarmo avatar Jul 13 '24 07:07 einarmo

@einarmo , is there a branch where this is is fixed ? I tried your branches async-server and rewrite-master but the problem is still there.

bs1959 avatar Aug 26 '24 15:08 bs1959

Are you sure? That really doesn't make sense, the server architecture in async-server does things very differently compared to the current server.

einarmo avatar Aug 26 '24 16:08 einarmo

@einarmo I confirmat that it WORKS with the asycn-server branch. (apologies, I mess the branches)

When can we expect to have a master branch with a working server ? I don't like to be dependent of a branch in a fork. Do you need help for that ?

bs1959 avatar Aug 27 '24 06:08 bs1959

There's only one maintainer for this repo, and he is often busy. There is a PR, #366, but I don't know what the status really is there.

I'm going to keep working on rewrite-master and PR from that one commit at a time, I'll see what I do once that reaches a point where I'm happy with everything.

einarmo avatar Aug 27 '24 10:08 einarmo

@einarmo

I believe I am experiencing this issue based on the data I have collected from an existing deployment using this library


This is the thread detailing the abnormal behavior: OPC-UA Timeouts & TimeoutHint

As you noted above–and based on my limited testing–the issue does not appear to occur on when using one of the sample servers from the FreeOpcUa/async-opcua fork.

Unfortunately... my usage of the library got deeper while the library was being forked.
As a result, I have leaned into some features that have seen extensive rewrites. Notably, codegen was reworked entirely, and various address space helper functions I was using were removed, etc.

Example of one such removed function that has been moved to a test function (which does denote the new usage): find_inverse_references


My guess is that backporting the fix for the problem would be non-trivial


But I wanted to ask and be sure, as moving from locka99/opcua -> FreeOpcUa/async-opcua is not insignificant from my side, either, as far as I can tell

How difficult do you think backporting a fix for this issue might be? I know you mentioned merging in components if possible, but I'm not sure if that's something you pursued, or whether such an approach is feasible at the moment.

Appreciate your time, and thank you for the extensive contributions to this repository and the resulting fork

evanjs avatar Apr 10 '25 19:04 evanjs

@evanjs sorry but I think backporting doesn't really make any sense in this case. I never set out to fix this bug, but the fork has a completely different architecture when it comes to managing connections and publish requests. The way it's done there makes this bug more or less impossible, but actually backporting that would mean backporting half the server rewrite, so thousands of lines of code.

In practice, if you wanted to fix this specific bug here, you would probably have to debug it from zero, I doubt the freeopcua fork is going to be much help.

einarmo avatar Apr 10 '25 19:04 einarmo

@einarmo Thank you, I appreciate the quick response.

I had a feeling that might be the case, but needed to hear it from somebody more familiar with the internals to get a better idea of just how practical it might be

I also wasn't sure which issue in particular I was encountering (https://github.com/locka99/opcua/issues/34#issuecomment-634312307 sounds similar as well)

It sounds like it might be worth taking a step back and triaging, potentially debugging one or both of the above.

Thank you for the sanity check

evanjs avatar Apr 10 '25 19:04 evanjs