py-junos-eznc
py-junos-eznc copied to clipboard
Feature request: keepalive or auto-reconnect
I'm working to implement Salt proxy minions to Juniper devices in a customer network where idle-timeouts are configured for 5 minutes. This means that if we don't send something into the session before the 5 minutes are up, the session dies and a new proxy minion must be built.
I have tried adding the keepalive directive to my ssh_config file, but this had no effect.
Host *
ServerAliveInterval
Based on the above, it seems like transport layer keepalive is not going to solve this problem. The Netconf spec currently does not contain a keepalive operation, and the IETF mailing list seems to have agreed not to create one: https://www.ietf.org/mail-archive/web/netconf/current/msg08888.html
I see three options for solving this problem
- No changes to the Netconf repo. Keepalives implemented entirely in application.
- Implement keepalive Device parameter and keepalive thread which executes some RPC on a set interval. I don't like this option because it will fill up device logs and induce CPU churn
- Add an auto_reconnect parameter to Device. This would allow for a check to be performed before every RPC is executed to make sure that the underlying SSH transport is still connected and functioning. If it is not, then call open() to get the transport up before running the RPC.
I think the third option is the easiest to implement. Would you consider a merge if I created this?
@spidercensus I think such check (and consecutive action) should be taken care by user's code. right now dev.connected value is static, I am planning to make to property hence the value will be returning the current state of connection. Using this value user can take action as per there need.
I agree that Device.connected needs to become a property. After my discussion with stacy, I was going to raise another request for that.
There is value in building this feature into the execute() and cli() methods on demand. Frameworks such as salt will be forced to check the connection state before every RPC request, which leads to greater overhead than an internal check only when the flag is turned on.
This is implemented in pull #669