nautobot-app-device-onboarding
nautobot-app-device-onboarding copied to clipboard
Error during onboarding: Could not find existing Nautobot device for requested primary IP address (192.168.0.1)
Environment
- Python version: 3.11.2
- Nautobot version: 2.0.2
- nautobot-device-onboarding version: 2.0.2
Steps to Reproduce
- In our Nautobot prod instance, try to onboard a device via api or gui.
- It would accept the job and run in the background.
- After a few seconds, we would see the failed result in device onboarding.
Expected Behavior
We had run an API script to save time and workaround the bug in device onboarding that prevents us to use bulk import. This worked ok on our QA and Lab Nautobot instances. I was expecting we would get the same results for Prod.
What happened instead:
During device onboarding for our Prod instance, the onboarding went well for most devices but we encountered this exception for several devices: cannot access local variable 'search_array_element' where it is not associated with a value.
We have lab, qa and prod nautobot instances, and only the prod instance encountered this error on several devices. Lab and QA imported the same device with no issues. All of them use Nautobot v2.0.2 and Device Onboarding v2.0.2.
In /var/log/messages, it was able to go through the auth banner and login successfully. It collected facts, then got the error Could not find existing Nautobot device for requested primary IP address (192.168.0.1)
. Please note, I changed the IP address and server name.
Nov 7 21:05:47 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:47,906: INFO/ForkPoolWorker-3] Authentication (password) successful!
Nov 7 21:05:49 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:49,820: INFO/ForkPoolWorker-3] COLLECT: device facts
Nov 7 21:05:52 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:52,240: INFO/ForkPoolWorker-3] COLLECT: device interface IPs
Nov 7 21:05:54 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:54,355: INFO/ForkPoolWorker-3] Could not find existing Nautobot device for requested primary IP address (192.168.0.1)
Nov 7 21:05:54 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:54,368: ERROR/ForkPoolWorker-3] Onboarding Error - Exception
Nov 7 21:05:54 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:54,368: ERROR/ForkPoolWorker-3] cannot access local variable 'search_array_element' where it is not associated with a value
Nov 7 21:05:54 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:54,378: INFO/ForkPoolWorker-3] Task nautobot_device_onboarding.worker.onboard_device_worker[00fd968d-aa61-44a5-ac03-2558a62b21a7] succeeded in 17.838233768939972s: {'ok': False}
We don't have any configurations for nautobot_device_onboarding yet in PLUGINS_CONFIG for all the instances.
Please let me know if you need more information. Thank you,
Just an update. I updated nautobot to 2.0.5 and device onboarding version from 2.0.2 to 3.0.1 in order to address the XSS vulnerability. I still got the same error in the return value when I ran the onboarding job..
{'exc_type': 'UnboundLocalError', 'exc_module': 'builtins', 'exc_message': ["cannot access local variable 'search_array_element' where it is not associated with a value"]}
Upon checking /var/log/messages on that job run, I was able to get more context in that error message. Looks like there are two device types being returned.
Would you be able to share the device types that may match the type of device you are onboarding?
The majority of the device types that was failing to onboard on our prod instance was a Cisco ASR1001-X Router. Actually I don't see an onboarded device having that type. I saw ASR1001 and ASR1001-HX device types and but there are no devices with type ASR1001-X.
Another example of device types having issues onboarding is ws-3650-48d. Other variations of 3650 like ws-c3650-48tq were onboarded with no issues.
You may want to check if there is a device type where the search array would match two different devices. From what I understand, it's looking for model name and part number of the device type to determine if there's an existing device type to select. Code where that is happening is here. I would also like to see this part of the matching of device type/manufacturer rewritten to be a bit more robust. I have had issues in my instance related to manufacturer. The onboarding plugin will extract the vendor name from the device if the manufacturer is not set and a matching device type with the union of manufacturer and device type already exist. This causes a failure to onboard in my current environment as we use "Cisco Systems, Inc" - onboarding tries to create same device type/parts number with "Cisco" and it bombs for duplicate keys.
@susanhooks thanks for your suggestion. I tried to look at the code but I don't know if how it works fully. But since it mentioned ensure_device_type as one of the methods, it might be trying to make sure the device type is there. So I tried to add a device type for CiscoASR1001-X in the gui (asr1001-x). I just copied the model name from our other Nautobot instances. After that, an ASR1001-X device has been onboarded successfully.
I think we're good to close this one. Just fyi, I would probably need to raise a separate issue in nautobot/nautobot. I get a different error for some devices that has been attempted to be onboarded in the past whose IP address was already listed in Nautobot. Here is the return value:
{'exc_type': 'IntegrityError', 'exc_module': 'django.db.utils', 'exc_message': ['duplicate key value violates unique constraint "ipam_prefix_namespace_id_network_prefix_length_b2dd8b57_uniq"\nDETAIL: Key (namespace_id, network, prefix_length)=(d39e97d3-41bc-4001-9a50-89330d2e67d0, \x0a122000, 24) already exists.\n']}
I tried to delete the prefix for that device in IPAM, but it does not work. I get a class 'KeyError' when I try to delete that IP.
thank you for your feedback, I know there are some issues with this plugin since the 2.x update, specifically around device type, and I've seen around IPAM as well, and we are working on addressing them. Bug issues in the repo definitely help. :)
I'm going to reopen this as I believe it is still an issue that may be encountered by others.
just fyi, I was able to work around the onboarding error about prefix on my previous post. There seems to be a discrepancy in what we're seeing in the UI. The Prefix's Type says 'Network' but when you edit the prefix, it says 'Container'.
The workaround is to make sure the Prefix's Type field is set to Network and saving it. The onboarding works after that.
More info and screenshots on this thread in the #nautobot channel.
As mentioned in the Slack thread, I'm currently pointing the finger at 'Network'
as an incorrect value coming from somewhere - it should be PrefixTypeChoices.TYPE_NETWORK
(== 'network'
) instead.
This may have been recently fixed by @jeffkala, I will find the PR.
I believe that the second issue with prefixes was fixed in #123, which I don't think has made it into a release just yet. I will double check that.
Looks like a potential root cause has been found. Thanks guys. :) Do you still need any output from my end? If not, I'll apply the workaround on the prefixes with issues on our prod instance.
I may have some clue on why the "Network" bug is showing up that I want to share. We have some devices that uses the wrong subnet for management. For example, the correct subnet for the management vlan is /23, but the network device mgt is configured with /24. That results to some devices with no ip address in nautobot while onboarding. And a couple of prefixes would show up with one as container type ("Network" under the hood) and another one with network type.
We could test this on one of our new instances to be sure. I'll share the results when we get there.
Please disregard my previous message, we had a lot of those prefixes that uses "Network". Updated them in nbshell to use "network" type.
Looks like this issue did not occur when my teammate built a nautobot instance a few weeks back on version 2.1.1, device onboarding version 3.0.1.
Thanks @nathanielfernandez . Will close this issue for now as it looks like we've been able to get by it.