sftp
sftp copied to clipboard
Way to detect underlying SSH connection has closed
It would be great if there was something similar to ssh.ClientConn.Wait() for this package. Even exposing the underlying ClientConn's Wait() would be fine.
The use case here is to detect ssh failures and re-initialize the connection. Since a new session is created when using sftp.NewClient, I can't just update the *ssh.ClientConn in the background and have everything work. I'd need to also re-start a new sftp subsystem session.
sftp.NewClient takes a ssh.Client object as an argument and that ssh.Client object has the Wait() method. So if you keep a reference to the client you passed in, you already have a way to call Wait to check on your ssh client connection. Maybe not the most convenient, but it is there.
In fact, it appears that the underlying ssh.Client needs to be closed explicitly as sftp.Client.Close() does not handle this as I had mistakenly assumed. So, we need to save a reference to the ssh.Client anyway. I am not sure if this was intended but it should be documented (eg. the example at the top of godocs does not indicate that the ssh client needs to be closed explicitly).
Thanks for the comment. We should probably add something in the docs for the NewClient function that mentions this.
Another thing is that I noticed I was wrong in my previous comment about keeping around the ssh.Client and using it's Wait() method. The ssh.Client doesn't have the Wait() method, it is the ssh.Session object that has that method.
I'm not sure the best way to address this. I think someone who actively uses the sftp.Client would be better suited to help with this. If someone would like to suggest a way and/or submit a pull request on how to provide more access to the underlying ssh.Session I'd very much appreciate it.
I dug into the ssh.Session code a bit and it looks like when you close the sftp.Client() the correct close gets called on the session. The StdinPipe() returns a io.WriteCloser and it's Close method is called when the sftp.Client closes. That closes the Session object. So I don't think this is a problem.
Hm.. the ssh.Client (Conn) does have the Wait() method per:
https://godoc.org/golang.org/x/crypto/ssh#Conn
I'm basically wrapping sftp.Client in a struct that contains a goroutine to call wait on the ssh.Conn and attempt a reconnect. The struct contains the appropriate pieces to call ssh.Dial and has a method GetConnection() which returns a pointer to the currently active sftp.Client.
Not the best but it does work. Not sure what makes the most sense for this library to handle reconnects.
What I meant was that the ssh.Client doesn't have Wait, the ssh.Session does. But you pass NewClient() the ssh.Client and it creates the ssh.Session internally and doesn't store it directly. So it wasn't as easy as just saving the ssh.Client object you passed in and using it to call Wait.
Wrapping it in a containing struct is a good idea. But I'm not sure if it is a good solution to include that in the sftp library. It would basically mean including utility functionality that could be useful, vs only the core library. It might be better to add it to the documentation or a FAQ for common solutions or maybe add it to an example. I'm not sure, please voice an opinion if you have one.
Thanks for your input.
@crazed Is your code that wraps the connection available anywhere? I was thinking about implementing something similar to see how it works and see if I could reduce it down enough to include in the docs as an example. Thanks.
@eikenb https://gist.github.com/crazed/8c4e965c142c8ec56d95247c9a4f0ccb
Probably not the cleanest solution, but it works
Thanks for the gist. I'll read through it as soon as I can.
@crazed I understand what you did there but I think that is a bit much to just add to the documentation. I could add the code along with a comment explaining the point of it to the examples section. Would that be OK?
There's probably a cleaner example but if you think it'll help others, I'm good with that.
On Sun, Jul 23, 2017, 9:01 PM John Eikenberry [email protected] wrote:
@crazed https://github.com/crazed I understand what you did there but I think that is a bit much to just add to the documentation. I could add the code along with a comment explaining the point of it to the examples section. Would that be OK?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pkg/sftp/issues/144#issuecomment-317295659, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKm2k2-oVtnmVr6JiqM-DPCxoRIzt8eks5sQ-zfgaJpZM4KY-JK .
@crazed I was messing around trying to come up with a simpler example but am having trouble reproducing the problem. What did you do to test the ssh connection having problems?
Honestly I had to work with a SFTP server hosted by a third party, I think it was an ETF GlobalScape something or other and it would crap out randomly.
I think you can recreate the issue by hard closing the underlying network connection.
Thanks.
So far I've only tried reproducing it by simply kill-9ing the sftp server. But the client handles that fine. I'll need to figure out a better way to simulate it. Assuming I do and get an example I'm happy with, I'll include any notes in comments.
I've recently come across this same issue so I was wondering if the current suggested solution by @crazed of waiting on the underlying ssh connection and recreating sftp client on connection shut down is still the best one, or maybe there has been some improvement regarding this?
Hey @IvanVlasic,
Yes, I think the idea from @crazed is indeed still the best option. I have also implemented some of the same code, but I found to get the highest reliability, and reduce the number of unnecessary reconnects, I defer any reconnection until I need to access the client again.
So, basically what I do is:
- when I initially create the reconnecting object, I do not actually connect. I just assign all the values necessary to (re)connect at any time.
- I have a receiver method called
Connect() errorto initiate a connection. This really just requests ansftp.Client - I have a receiver method
client() (*sftp.Client, error)that returns a Client any time there is need for one. - In
client: if it believes it is still be connected, it will test the connection withGetwdjust to be sure the connection is valid. If this call fails, it will force a (re)connect. (This might be no longer necessary. My service doesn’t have a lot of traffic, so it hasn’t been a priority for a refactor for optimization.) - If
clientbelieves it is disconnected (see below), or as mentionedGetwderrors, then it (re)connects. - After a successful reconnection, it starts a goroutine that uses the newer
sftp.Client.Wait(). Once that returns, it closes thessh.Conn(throwing away and/or logging any error) and then signals a disconnection (used by the mainclientbody). - It then assigns the connected
sftp.Client,ssh.Conn, and connected signal into the reconnecting object, and returns thesftp.Client.
So, any time, I need to do any operation on a connected sftp.Client I just call client() on the reconnecting object, and it either gives me a dial connect error, or a connected sftp.Client and at that point, I can do whatever I need to do with it. Then I just forget that client ever existed at the end of its usage, and the next usage asks into client() all over again.
P.S.: this is potentially something generic enough, that I might be able to spin off a subpackage here which handles it? It would probably require a lot more polish though, and likely wouldn’t be able to fulfill all interests. But at least, it would be a ground floor for people to use?
Thank you for the detailed explanation @puellanivis. This is similar approach that I've used, and as you've said, it might be good idea to have example of this among other ones available in the project, just so people like me have general idea of how to solve this and don't have to search through 4 year old issue for discussion on it.
@puellanivis just came across this thread. Thanks for the explanation. Did you end up having time to build an example of this that I can reference?