mina-sshd icon indicating copy to clipboard operation
mina-sshd copied to clipboard

Fix: close session when timeout during connect

Open onyas opened this issue 3 years ago • 10 comments

Hi there,

We have encouraged an issue in which the SSHD doesn’t close the session when connecting timeout.

The code we are using is attached here. https://gist.github.com/onyas/fda8ccf2d373c743ccec8f648782d078
The TCP socket is still ESTABLISHED after sshClient.connect() threw an exception. image

If I set the timeout to 1 millisecond, the sshClient.connnect() will throw an exception, since it’s timeout so the session is still null, and it doesn't help if we call session.close().

And we know we cloud call sshClient.stop(), but in our scenario, we reuse the same instance so we couldn’t do that.

After checking the source code, we find the SSHD didn’t close the session when timeout. So I created this pull request to help close the session.

Please help review and thanks.

onyas avatar Sep 01 '22 13:09 onyas

As far as I see what happens is that the asynchronous connection attempt succeeds after the timeout.

I'd move the timeout flag and the isTimeout() method to DefaultSshFuture. I also think the flag must be set and read under lock, otherwise there can still be a race condition. And instead of

if (connectFuture.isTimeout()) {
  session.close();
} else {
  connectFuture.setSession(session);
}

there needs to be a mechanism to do this atomically. Something like

  /** Like setValue; but sets the value only if not timed out yet. Throws TimeoutException otherwise. */
  public void setValueIfNotTimedOut(Object value) throws TimeoutException {
    synchronized(lock) {
       if (result != null ) {
         return; 
       }
       if (isTimeout()) {
         throw new TimeoutException();
       }
       result = (newValue != null) ? newValue : GenericUtils.NULL;
       lock.notifyAll();
    }
  }

and then

  private void setSessionSafely(ConnectFuture connectFuture, ClientSession session) throws IOException {
    try {
      connectFuture.setValueIfNotTimedOut(session);
    } catch (TimeoutException e) {
      session.close();
    }
  }

Finally, it would be really cool if this could be done before the ClientSession gets created at all (by closing the IoSession immediately). Creating the client session will already start the SSH protocol. But the ClientSession is created in AbstractSessionIoHandler.sessionCreated(IoSession), where we don't have access to the ConnectFuture, so I don't quite see how this could be done.

tomaswolf avatar Sep 08 '22 13:09 tomaswolf

Hi, @tomaswolf Thank you for your insights, and I updated the code according to your suggestions. Please help review.

Thanks.

onyas avatar Sep 09 '22 01:09 onyas

Hi, @tomaswolf Thank you for the suggestions, I updated the code, please help review.

Thanks.

onyas avatar Sep 09 '22 10:09 onyas

Can we write a test for this?

I've created issue SSHD-1295 in our bug tracker for this, and have posted a test case there. Please incorporate that test into this change. (Note that it may need to be adapted if we never set the session on the future if the future had timed out before.)

tomaswolf avatar Sep 09 '22 17:09 tomaswolf

Hi @tomaswolf All the suggestions are implemented, please take a look.

Thanks.

onyas avatar Sep 10 '22 06:09 onyas

Looks good now. Is it OK for you if I squash this series and fix some of the javadoc comments, and then force-push onto this branch?

tomaswolf avatar Sep 10 '22 17:09 tomaswolf

Sure, please go ahead.

onyas avatar Sep 11 '22 00:09 onyas

@lgoldstein and @gnodet: I'd appreciate your opinions on this. Please also see my thoughts at SSHD-1295.

tomaswolf avatar Sep 11 '22 12:09 tomaswolf

Hi, @tomaswolf Good day, another question, when is this version 2.9.2 planned to be released?

onyas avatar Sep 13 '22 01:09 onyas

"2.9.2-SNAPSHOT" is just the current development version. I don't think we'll do another patch release for this. So most likely the change will come whenever 2.10.0 is released, which may be in a few months.

tomaswolf avatar Sep 13 '22 09:09 tomaswolf

Just a heads up: the basic idea is going in the right direction, but cancelling a ConnectFuture has the same problem. I've come to the conclusion that cancellation has to be sorted out first, and then this time-out case can be based on that. (I.e., client code would get the possibility to specify in the verify() call whether to cancel a future automatically on a time-out.) That gives client code full control.

I'm working on this.

tomaswolf avatar Oct 13 '22 06:10 tomaswolf

client code would get the possibility to specify in the verify() call whether to cancel a future automatically on a time-out

Let's keep backward compatibility as much as possible with existing API - i.e., keep the existing verify without the flag and delegate it to call the new verify using some default that you deem relevant.

BTW, @tomaswolf you are doing great work on the project (I really appreciate it) - I wish I had the time I used to, so I could match your prolific work both in quality and quantity.

lgoldstein avatar Oct 13 '22 14:10 lgoldstein

keep the existing verify without the flag and delegate it to call the new verify using some default that you deem relevant

Yes, that was my thought exactly. The new default for the existing verify() would be to cancel to avoid this problem of having a "hidden" session established after the time-out that the user cannot really get to.

That would solve the reported problems; and I consider calling verify() on the same ConnectFuture in different threads so extremely borderline (unfortunately it's something that the current API has always allowed) that it is IMO acceptable to require client code that does that currently to be changed. A notice to that effect in the release notes will be needed.

tomaswolf avatar Oct 13 '22 15:10 tomaswolf

The new default for the existing verify() would be to cancel to avoid this problem of having a "hidden" session established after the time-out that the user cannot really get to.

Seems reasonable - I trust your judgement on this.

lgoldstein avatar Oct 13 '22 15:10 lgoldstein

"2.9.2-SNAPSHOT" is just the current development version. I don't think we'll do another patch release for this. So most likely the change will come whenever 2.10.0 is released, which may be in a few months.

Hi, @tomaswolf do we have a release date for 2.10.0?

onyas avatar Dec 15 '22 09:12 onyas

do we have a release date for 2.10.0?

No.

tomaswolf avatar Dec 15 '22 10:12 tomaswolf

Is there any possibility to release it early next year? our customer is eager to get this fixed. If not, can you provide any plan so that we can set the expectation for the customer.

onyas avatar Dec 20 '22 01:12 onyas

I can't promise anything. All the maintainers in this project are unpaid volunteers doing this in their spare time. I wrote in October I'd be working on this, and I am, but progress depends on the amount of spare time I have and that I am willing to dedicate to this. (Evidently, I do many other things in my spare time, too.) So I can't give a deadline.

tomaswolf avatar Dec 20 '22 10:12 tomaswolf

Got it. Thank you for your contribution.

onyas avatar Dec 21 '22 01:12 onyas

Please see #315 for my take on this.

tomaswolf avatar Jan 28 '23 15:01 tomaswolf

Superceded by #315.

tomaswolf avatar Mar 19 '23 18:03 tomaswolf