otp
otp copied to clipboard
ssh: ssh keep alive
This PR aims at introducing the equivalent of openssh ClientAliveInterval/ClientAliveCountMax (https://man.openbsd.org/sshd_config#ClientAliveCountMax) and ServerAliveInterval/ServerAliveCountMax
There's 2 notable differences:
- In openssh "The default is 0, indicating that these messages will not be sent to the client.", but in this implementation infinity is used instead of 0 and ssh_options checks that a positive integer is presented;
- Keep-alive messages can't be sent during renegotiation, but since this feature acts as a keep-alive and a timeout, an equivalent timeout is established for the renegotiation procedure if alive is enable. This is implemented with a timeout called renegotiation_alive
CT Test Results
2 files 29 suites 20m 28s ⏱️ 485 tests 479 ✅ 6 💤 0 ❌ 1 690 runs 1 664 ✅ 26 💤 0 ❌
Results for commit 21018024.
:recycle: This comment has been updated with latest results.
To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.
See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.
Artifacts
- Complete CT logs (Download Logs)
- HTML Documentation (Download HTML Docs)
- No Windows Installer found
// Erlang/OTP Github Action Bot
@u3s this a feature I promised long time ago, if you are still interested in it please continue reading :)
I know you are short on time, but if you could give me some tips on how to create some tests for this would be appreciated.
I wanted to add tests at least these 3 scenarios:
- Normal scenario client and server send alive messages (This one I have one idea how to test which would be using the ssh_dbg module, ssh_dbg:on([ssh_messages]) reveals the sending of the messages but it's kind of ugly to use a regex to get the keep alive message);
- Timeout scenario. I would like that the server or client become not responsive and I would like to check that the connections is terminated (this one we have tested before suspending the openssh process but it's not so nice);
- The renegotiation scenario in which the during the renegotiation the peer becomes unreachable and the timeout should have effect.
- we would like to proceed with that one
- can you convert it into PR? so we could start testing it and discussing implementation details
- for testing, my 1st guess would be ssh_trpt_lib module and do it kind of whitebox way
Sounds good. I’ll rebase and start working on the tests.
@u3s I have just pushed changes that address most of the comments.
There's 2 tests now in place. I started to add a third test that checks that the renegotiation_alive timeout works, but I could not start the renegotiation. I left a comment on the test and mentioned you there to see if you can try to help spot what am I doing wrong.
@u3s a colleague just brought to my attention that we are spending more time that we would need detecting that a connection is unstable if the rekey triggers. The reason for that is that we don't deduct that time spent since the last alive from the timer we set for the rekey.
I am happy to open a follow up PR fixing this after this one is merged as we seem to be in a very good state with the current PR. What do you think? Does this sound reasonable?
@u3s a colleague just brought to my attention that we are spending more time that we would need detecting that a connection is unstable if the rekey triggers. The reason for that is that we don't deduct that time spent since the last alive from the timer we set for the rekey.
I am happy to open a follow up PR fixing this after this one is merged as we seem to be in a very good state with the current PR. What do you think? Does this sound reasonable?
You mean a situation when for example:
- a peer with alive option (e.g
{3, 100}) - sends 2 out of 3 probes (this might take ~290ms)
- then same peer makes a local decision about renegotiating keys
- this cancels the normal alive timeout
- and starts renegotiation timeout for
300ms - as a result detecting a dead connection will take 590ms instead of 300ms which user might expect
Yes. Would be nice to iron out this case in a separate PR.
@u3s a colleague just brought to my attention that we are spending more time that we would need detecting that a connection is unstable if the rekey triggers. The reason for that is that we don't deduct that time spent since the last alive from the timer we set for the rekey. I am happy to open a follow up PR fixing this after this one is merged as we seem to be in a very good state with the current PR. What do you think? Does this sound reasonable?
You mean a situation when for example:
- a peer with alive option (e.g
{3, 100})- sends 2 out of 3 probes (this might take ~290ms)
- then same peer makes a local decision about renegotiating keys
- this cancels the normal alive timeout
- and starts renegotiation timeout for
300ms- as a result detecting a dead connection will take 590ms instead of 300ms which user might expect
Yes. Would be nice to iron out this case in a separate PR.
Ok. I will open a PR for this soon.
great. maybe adjust testcases in ssh_protocol_SUITE, so they verify that disconnect happens as quick as expected?
op({expect,timeout,E}, S0) in ssh_trpt_lib might be the tool for the job?