nitox
nitox copied to clipboard
Clients never disconnect, even after manual drop.
Hey all, I've been doing some hacking on this lib and noticed this issue during development. Looks like it is closely related to #6, though I am not building new connections in a loop, I am simply attempting to gracefully shutdown the system in order to avoid error conditions as much as possible.
So, I noticed that even after I tear down any open streams that I've created in my program, and all other resources have shutdown, the tokio runtime will still not exit because it is waiting on a few resources. After doing some digging, I created a test case as follows:
- spawn a new future onto the tokio runtime which simply crates a new basic
NatsClient. - immediately after the client is created, I call
drop(client)and then resolve the future. - I call the nats monitoring port at the endpoint
/connzto check on the open connections, and I see that there will still be 3 open connections fromnats(I've added the JSON output below). - after that, I've even sent the program a SIGTERM, and the connections still stay open and hang.
My analysis:
- we will need to create a pattern where we can close the underlying streams when the parent clients are dropped.
- we could use the stream-cancel crate for this.
- perhaps we can trigger the stream closing events whenever the client is dropped (simply impl Drop on the client, and put the needed functionality there).
Here are the JSON logs mentioned above in my test case. These are from /connz. Please note: the first three connections were there before starting my nitox program, and remain there even after I manually kill the program (aka, they are nats internal).
{
"server_id": "EYKg2swNQ9eHDUQ6PVhghE",
"now": "2019-01-21T00:46:24.3239911Z",
"num_connections": 6,
"total": 6,
"offset": 0,
"limit": 1024,
"connections": [
{
"cid": 1,
"ip": "127.0.0.1",
"port": 34318,
"start": "2019-01-18T06:04:45.691982154Z",
"last_activity": "2019-01-20T00:01:36.9471481Z",
"rtt": "275µs",
"uptime": "5h8m49s",
"idle": "2h7m0s",
"pending_bytes": 0,
"in_msgs": 347,
"out_msgs": 0,
"in_bytes": 15217,
"out_bytes": 0,
"subscriptions": 0,
"name": "_NSS-nats-streaming-cluster-send",
"lang": "go",
"version": "1.6.0"
},
{
"cid": 2,
"ip": "127.0.0.1",
"port": 34320,
"start": "2019-01-18T06:04:45.693148843Z",
"last_activity": "2019-01-21T00:41:14.3273053Z",
"rtt": "929µs",
"uptime": "5h8m49s",
"idle": "5m10s",
"pending_bytes": 0,
"in_msgs": 1615,
"out_msgs": 1206,
"in_bytes": 11107,
"out_bytes": 8712,
"subscriptions": 8,
"name": "_NSS-nats-streaming-cluster-general",
"lang": "go",
"version": "1.6.0"
},
{
"cid": 3,
"ip": "127.0.0.1",
"port": 34322,
"start": "2019-01-18T06:04:45.693702396Z",
"last_activity": "2019-01-20T00:01:36.9453977Z",
"rtt": "576µs",
"uptime": "5h8m49s",
"idle": "2h7m0s",
"pending_bytes": 0,
"in_msgs": 0,
"out_msgs": 15,
"in_bytes": 0,
"out_bytes": 375,
"subscriptions": 0,
"name": "_NSS-nats-streaming-cluster-acks",
"lang": "go",
"version": "1.6.0"
},
{
"cid": 67,
"ip": "172.20.0.1",
"port": 46212,
"start": "2019-01-21T00:45:35.573159Z",
"last_activity": "2019-01-21T00:45:35.5760492Z",
"rtt": "2ms",
"uptime": "48s",
"idle": "48s",
"pending_bytes": 0,
"in_msgs": 0,
"out_msgs": 0,
"in_bytes": 0,
"out_bytes": 0,
"subscriptions": 0,
"name": "nitox",
"lang": "rust",
"version": "0.1.x"
},
{
"cid": 68,
"ip": "172.20.0.1",
"port": 46210,
"start": "2019-01-21T00:45:35.573364Z",
"last_activity": "2019-01-21T00:45:35.5762686Z",
"rtt": "2ms",
"uptime": "48s",
"idle": "48s",
"pending_bytes": 0,
"in_msgs": 0,
"out_msgs": 0,
"in_bytes": 0,
"out_bytes": 0,
"subscriptions": 0,
"name": "nitox",
"lang": "rust",
"version": "0.1.x"
},
{
"cid": 69,
"ip": "172.20.0.1",
"port": 46214,
"start": "2019-01-21T00:45:35.5744776Z",
"last_activity": "2019-01-21T00:45:35.5761763Z",
"rtt": "1ms",
"uptime": "48s",
"idle": "48s",
"pending_bytes": 0,
"in_msgs": 0,
"out_msgs": 0,
"in_bytes": 0,
"out_bytes": 0,
"subscriptions": 0,
"name": "nitox",
"lang": "rust",
"version": "0.1.x"
}
]
}
Hi and thanks for such a precise and detailed report.
The lack of a graceful shutdown feature you mention is the reason why I've been playing around with rx/tx and shutdown_now() in the examples.
We might want to update them and the tests as well once the feature is complete.
I might have a solution working in https://github.com/YellowInnovation/nitox/commit/1a2037acca608b4908857e033e787ad3adb93d8b, what do you think @thedodd?
@OtaK awesome. That's pretty much what I had in mind as well. Will review shortly. Do you want me to wait until the PR is no longer in draft status? Or you want me to review now?
Now is fine, I've added a test with a typical use case and it passes but I'm not sure if it cover yours as well.