autobahn-js
autobahn-js copied to clipboard
Session detached after new session attached, resulting in `wamp.error.procedure_already_exists`
I'm using Crossbar v20.12.3 and autobahn v20.9.2. Sometimes the onclose
callback is called before the session is detached from the router.
I had a running process connected to the router with session ID 7159338250676650
. Then, the onclose
event was called at 2023-05-15T20:51:11.041Z with the following content
{
"reason": "lost",
"details": {
"reason": null,
"message": null,
"retry_delay": 1.9348029488634622,
"retry_count": 1,
"will_retry": true
}
}
In our implementation, we register all procedures that were registered before as soon as the connection is open again. So, as soon as we received the open event, we registered all procedures that were registered before. However, all of them failed to be created with the error wamp.error.procedure_already_exists
. This should not be possible because we only try to register when the new session is connected.
After reviewing router logs, I found out that the previous session had its detached session
log about 30 seconds after the new session connected. So my understanding is:
- Client lost connection at 20:51:11.041
- Router logged the new session at 20:51:13, below is the log
2023-05-15T20:51:13+0000 [Router 12] attached session 1165314019232070 to realm "myrealm" (authid="username", authrole="role") <crossbar.router.router.Router.attach>
- Client called the
onopen
callback at 20:51:13.512Z and it tried to register all procedures that were registered with the old session, however none of them worked because router still had to detach old session - Router logged old session detachment at 20:51:46, below is the log
2023-05-15T20:51:46+0000 [Router 12] detached session 7159338250676650 from realm "myrealm" (authid="username", authrole="role", detached 1 sessions total) <crossbar.router.router.Router.detach>
So the question is, should the router detach the session before attaching the new one? Otherwise, how can we handle disconnections and register procedures again? Thanks!
Sometimes the onclose callback is called before the session is detached from the router.
if so (a session is detached client side only), the that session's internal list of registered procedures and subscriptions must be cleared together with the connection being closed (and possibly auto-reconnect then happen ..)
if this is not the case, it is likely a bug
in this case, we first need a test that reproduces the situation (and then fails currently, because of the supposed bug), and then a fix which rectifies the bug and makes the (new) test pass
The reason I'm suspicious about this is reconnections happen several times a day, however this is the time we experience such issue. It's not really easy to look into the data (because router is running with minimal logging), however I'm confident about my analysis because the timestamps between router and client logs match. I'm almost sure the old session was not cleared before the new one was created, and this resulted in the list of procedures being held up.
How would we reproduce this? We've been running our production environment for a lot of time now, and it's the first time we experience such issue (although I would not be surprised if it happened another time in the past and I did not recognize it). Perhaps I can run a script on the staging environment that does frequent reconnects? Also, I'm not sure whether clearing out procedures is something that ought to be done on the router or client. Is there resources about this? I've looked into the protocol specification and I'm not sure whether this report belongs to crossbar or Autobahn either.
Hi @oberstet Following up on this. I would love to help figuring out whether this is an actual bug (and possibly help fix it), but not sure the best way of reproducing. Is what I proposed above fine? I might just go ahead and try it out, or is there any standard you're following when debugging WAMP related libraries?
if it makes sense and works, one option would be expanding the automated testing that we do have in this repo. eg consider
https://github.com/crossbario/autobahn-js/blob/c831f35c1911eb31ec8ffdfe0f62fa15e9dcccb6/packages/autobahn/test/test_pubsub_basic.js#L24
this will fire up 2 sessions (originating from the 1 test driver instance) connecting to the same router, and then does checking of behavior across sessions.
a test might even "kill -9" the router instance during the test, then restart a router instance, then wait for the testee sessions to reconnect, and then test whatever behavior during or after the reconnect should be tested for ..
local testing can be started via makefile: https://github.com/crossbario/autobahn-js/blob/c831f35c1911eb31ec8ffdfe0f62fa15e9dcccb6/Makefile#L154
github actions automated testing is using that makefile also: https://github.com/crossbario/autobahn-js/blob/c831f35c1911eb31ec8ffdfe0f62fa15e9dcccb6/.github/workflows/main.yml#L44C18-L44C18