napalm-logs
napalm-logs copied to clipboard
napalm-logs consumer gets stuck when "Host" key is missing from msg_dict
Whilst integrating IOS-XR we caused our consumers to get stuck, where they observed the following stack trace:
Process Process-26:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3/dist-packages/napalm_logs/server.py", line 304, in start
host=msg_dict["host"],
~~~~~~~~^^^^^^^^
KeyError: 'host'
The logging hostnameprefix was not configured on our XR node which meant it fell into default mode of sending logs without the hostname included. This is supported by the IOS-XR log parser as updated in https://github.com/napalm-automation/napalm-logs/pull/171.
Unfortunately the server.py code tries to access msg_dict["host"] directly in a few places. As this doesn't exist for those default style IOS-XR messages, it's not included in the object so the consumer fails and gets stuck at that point.
The problematic lines in napalm-logs that I've found so far:
- https://github.com/napalm-automation/napalm-logs/blob/develop/napalm_logs/server.py#L304 (this is the line we crashed on today)
- https://github.com/napalm-automation/napalm-logs/blob/develop/napalm_logs/server.py#L330
Suggested fix:
- Add safe access for the value and return "Unknown" for the host value when necessary. That way we are aligning with how unknown hosts are handled in the napalm-logs code.
OR
- Explicitly encode
hostname: unknowninto the IOS-XR logs parser.
I prefer option 1 as that is vendor agnostic and is the safer approach for this code. Option 2 dictates that any other vendors will need to encode unknown as needed rather than the server.py code just handling it safely and avoiding the consumer getting stuck processing.