Server: PublishResponse sent back on the wrong connection
In spawn_subscriptions_task of TcpTransport, all sessions of the server are iterated, it should only process sessions attached to this secure channel (TcpTransport). This is a major problem, because it ends up with PublishResponse send back to the wrong channel (which has not requested a publish request) and the channel which has sent a PubglishRequest receives nothing.
This is linked to issue #285 and can explain #326
I'm currently working on a rewrite of the server as well, like I did with the client, though there is a fair bit left until it's comparable to the current server implementation. It has a completely different approach to dealing with connections and subscriptions, and will definitely fix this.
@einarmo , is there a branch where this is is fixed ? I tried your branches async-server and rewrite-master but the problem is still there.
Are you sure? That really doesn't make sense, the server architecture in async-server does things very differently compared to the current server.
@einarmo I confirmat that it WORKS with the asycn-server branch. (apologies, I mess the branches)
When can we expect to have a master branch with a working server ? I don't like to be dependent of a branch in a fork. Do you need help for that ?
There's only one maintainer for this repo, and he is often busy. There is a PR, #366, but I don't know what the status really is there.
I'm going to keep working on rewrite-master and PR from that one commit at a time, I'll see what I do once that reaches a point where I'm happy with everything.
@einarmo
I believe I am experiencing this issue based on the data I have collected from an existing deployment using this library
This is the thread detailing the abnormal behavior: OPC-UA Timeouts & TimeoutHint
As you noted above–and based on my limited testing–the issue does not appear to occur on when using one of the sample servers from the FreeOpcUa/async-opcua fork.
Unfortunately... my usage of the library got deeper while the library was being forked. As a result, I have leaned into some features that have seen extensive rewrites. Notably, codegen was reworked entirely, and various address space helper functions I was using were removed, etc.
Example of one such removed function that has been moved to a test function (which does denote the new usage): find_inverse_references
My guess is that backporting the fix for the problem would be non-trivial
But I wanted to ask and be sure, as moving from locka99/opcua -> FreeOpcUa/async-opcua is not insignificant from my side, either, as far as I can tell
How difficult do you think backporting a fix for this issue might be? I know you mentioned merging in components if possible, but I'm not sure if that's something you pursued, or whether such an approach is feasible at the moment.
Appreciate your time, and thank you for the extensive contributions to this repository and the resulting fork
@evanjs sorry but I think backporting doesn't really make any sense in this case. I never set out to fix this bug, but the fork has a completely different architecture when it comes to managing connections and publish requests. The way it's done there makes this bug more or less impossible, but actually backporting that would mean backporting half the server rewrite, so thousands of lines of code.
In practice, if you wanted to fix this specific bug here, you would probably have to debug it from zero, I doubt the freeopcua fork is going to be much help.
@einarmo Thank you, I appreciate the quick response.
I had a feeling that might be the case, but needed to hear it from somebody more familiar with the internals to get a better idea of just how practical it might be
I also wasn't sure which issue in particular I was encountering (https://github.com/locka99/opcua/issues/34#issuecomment-634312307 sounds similar as well)
It sounds like it might be worth taking a step back and triaging, potentially debugging one or both of the above.
Thank you for the sanity check