IOS-XR: if a commit fails, read_until_prompt waits forever
An issue occurs when a commit on IOS-XR results in an error. I see this with CiscoIOSXR.send_config_set(). After the code sends the commit, it then runs IOSLikeDevice._read_until_prompt(), but if the commit fails, then _read_until_prompt() will wait forever.
Here is an example of an interactive session that produces a failed commit (I presume it doesn't really matter what produces the commit failure, as long as it fails):
RP/0/RSP0/CPU0:xxx#conf t
Fri Feb 9 19:26:29.685 GMT
RP/0/RSP0/CPU0:xxx(config)#interface GigabitEthernet0/0/1/2.2859
RP/0/RSP0/CPU0:xxx(config-subif)#service-policy input 03692d7b-9acd-4a14-b4ed-345f513518$
RP/0/RSP0/CPU0:xxx(config-subif)#service-policy output 03692d7b-9acd-4a14-b4ed-345f51351$
RP/0/RSP0/CPU0:xxx(config-subif)#
RP/0/RSP0/CPU0:xxx(config-subif)#
RP/0/RSP0/CPU0:xxx(config-subif)#commit
Fri Feb 9 19:26:53.050 GMT
% Failed to commit one or more configuration items during a pseudo-atomic operation. All changes made have been reverted. Please issue 'show configuration failed [inheritance]' from this session to view the errors
RP/0/RSP0/CPU0:xxx(config-subif)#end
Uncommitted changes found, commit them before exiting(yes/no/cancel)? [cancel]:no
(It fails because the policies mentioned in the config do not exist)
Note that the prompt does not return after "end" is entered: instead, the returned bytes are Uncommitted changes found, commit them before exiting(yes/no/cancel)? [cancel]:. This is the cause of the problem.
This is the code in CiscoIOSXR.send_config_set():
async def send_config_set(self, config_commands=None, with_commit=True, commit_comment='', exit_config_mode=True):
# Send config commands by IOS Like Device
output = await super().send_config_set(config_commands=config_commands, exit_config_mode=False)
if with_commit:
commit = type(self)._commit_command
if commit_comment:
commit = type(self)._commit_comment_command.format(commit_comment)
self._stdin.write(self._normalize_cmd(commit))
output += await self._read_until_prompt()
if exit_config_mode:
output += await self.exit_config_mode() # <<-------- HERE
The commit, and then the end, is issued on the line marked above. This runs the method in the IOSLikeDevice superclass:
async def exit_config_mode(self):
logger.info('Host {}: Exiting from configuration mode'.format(self._host))
output = ''
exit_config = type(self)._config_exit
if await self.check_config_mode():
self._stdin.write(self._normalize_cmd(exit_config))
output = await self._read_until_prompt() # <<------------ HERE
if await self.check_config_mode():
raise ValueError("Failed to exit from configuration mode")
return output
In the line marked above, if the commit has failed, and then end is sent (inside exit_config()), the prompt will never be found, because read_until_prompt() simply calls read_until_pattern(self._base_pattern):
async def _read_until_pattern(self, pattern='', re_flags=0):
output = ''
logger.info("Host {}: Reading until pattern".format(self._host))
if not pattern:
pattern = self._base_pattern
logger.debug("Host {}: Reading pattern: {}".format(self._host, pattern))
while True:
output += await self._stdout.read(self._MAX_BUFFER)
if re.search(pattern, output, flags=re_flags): # <<----------- HERE
logger.debug("Host {}: Reading pattern '{}' was found: {}".format(self._host, pattern, repr(output)))
return output
Eventually, the device returns all the bytes up to the end of the confirmation request:
Uncommitted changes found, commit them before exiting(yes/no/cancel)? [cancel]:
And of course no prompt can be detected in this, and no more bytes are coming.
I will do a bit more debugging soon, but I just wanted to raise this issue now to get your thoughts. I'm happy to do the work to fix or improve the handling of errors during commit, if you could give your thoughts, or make any suggestions of how best to proceed?
One idea that I had while looking at the code was to add timeout handling inside _read_until_pattern():
while True:
output += await self._stdout.read(self._MAX_BUFFER) # <<------ on this line
only so that the output can be captured and logged. Currently, if the code waits here, and an asyncio.wait_for() triggers due to timeout in an outer scope, any bytes currently pending in the output identifier will be lost. By capturing and logging those bytes (and then possibly reraising a TimeoutError), it should make it easier to diagnose why a timeout occurred.
But for the specific case where we get the Uncommitted changes found problem, it is probably better to try to match that also in read_until_pattern, so that it can be handled directly, rather than waiting for a timeout. I'm not sure yet, and would like to know what your thoughts are before I do any work on this.
By the way, thanks for making this library!
Thank you for your interest to this library! I really didn't test carefully IOS XR class because I haven't it in my work. I can divide my thoughts into 2 points:
- I want to make timeout for operation for a long time :) I want to add default timer to all read operations in all classes for better controlling the flow. And I have the stimulus to do it right now :) I think that it should raise timeout exception.
- In this particular situation, it's a normal operation and we should make changes in function for normal processing this variant.
Firstly, we should check the output for the string "Failed to commit" and raise the "Failed to commit" exception. And we also should check "Uncommitted changes found" after "end" command.
Right now we have function with these parameters async def send_config_set(self, config_commands=None, with_commit=True, commit_comment='', exit_config_mode=True). So we can have different situations:
- with_commit=True - All changes will be commited. If not - we will get "Failed to commit" exception
- with_commit=False and exit_config_mode=False. Nothing special, user should make decision himself
- with_commit=False and exit_config_mode=True - I think we should always exit from config mode without any changes. In this situation we should check the output for "Uncommitted changes found" string by using special function for this purpose:
_read_until_prompt_or_pattern. We should overrideexit_config_modein IOS XR class or maybe change this function in IOS class.
I can make these changes in this week, but if you really want to help me and make changes yourself - it will be really great!! You can pm me in telegram or slack and I can help you make PR for it.
I'm happy to help.
I will create a PR, and then the first thing I'll do is create a test that demonstrates the issue. Then we can discuss the test. Then I will add code to resolve the problem, which will include the handling you've discussed above.
Sound good?
Yes, it will be amazing! I will wait for the PR!
Quick note to say I have not forgotten about this issue, but I've been extremely busy recently.
Hello, @cjrh! I have just pushed new code to IOS XR class. Right now it works like this:
- with_commit=True - All changes will be commited. If not - we will get
CommitErrorexception - with_commit=False and exit_config_mode=False. Nothing special, user should make decision himself
- with_commit=False and exit_config_mode=True - exit from config without any changes and commits
For taking this behavior I made these changes https://github.com/selfuryon/netdev/compare/master...develop#diff-c089f96ea63362ebd55df599d9d7c153. The main:
- I change cleanup function for properly session termination. In all sessions, we should clear all unsaved changes, so I simply write "abort" to channel. It's useful when we get Exception because we are using an async context manager which always runs cleanup function.
- I changed
exit_config_modefunction for properly working with an unsaved date when we have this condition "with_commit=False and exit_config_mode=True".
Can you test it in your environment? After that, I will release it in master branch :)
Forgot! I also added timeout params for managing timeout in all async operations :) You can use it only in init. The default timeout is 15 seconds:
def __init__(self, host=u'', username=u'', password=u'', port=22, device_type=u'', known_hosts=None,
local_addr=None, client_keys=None, passphrase=None, timeout=15, loop=None):
I thought about adding timeout exactly to operations (send_config_set and send_command) but I don't know if it's really necessary. Do you need this?
@selfuryon Thanks! I will have a look towards the middle or end of this week.
@cjrh Hello! Did you test new code? :)
Sadly, I have not tested it yet. I did the workaround we discussed, where I made a subclass of the XR class and so far that has worked very well. sorry!
This same stuck can happen on IOS. I suspect the find_base_prompt() gets stuck for a similar reason... Say you have a switch that has an automatic vty command, and you forget to issue the proper command....
autocommand who
but they forgot the... autocommand-options nohangup
Well, now you connect, get logged in, and before the prompt ever appears, you're disconnected.
Well now you're stuck, you're stuck at the find_base_prompt() forever...