nautobot-app-device-onboarding icon indicating copy to clipboard operation
nautobot-app-device-onboarding copied to clipboard

Error during onboarding: Could not find existing Nautobot device for requested primary IP address (192.168.0.1)

Open nathanielfernandez opened this issue 1 year ago • 16 comments

Environment

  • Python version: 3.11.2
  • Nautobot version: 2.0.2
  • nautobot-device-onboarding version: 2.0.2

Steps to Reproduce

  1. In our Nautobot prod instance, try to onboard a device via api or gui.
  2. It would accept the job and run in the background.
  3. After a few seconds, we would see the failed result in device onboarding.

Expected Behavior

We had run an API script to save time and workaround the bug in device onboarding that prevents us to use bulk import. This worked ok on our QA and Lab Nautobot instances. I was expecting we would get the same results for Prod.

What happened instead:

During device onboarding for our Prod instance, the onboarding went well for most devices but we encountered this exception for several devices: cannot access local variable 'search_array_element' where it is not associated with a value.

We have lab, qa and prod nautobot instances, and only the prod instance encountered this error on several devices. Lab and QA imported the same device with no issues. All of them use Nautobot v2.0.2 and Device Onboarding v2.0.2.

In /var/log/messages, it was able to go through the auth banner and login successfully. It collected facts, then got the error Could not find existing Nautobot device for requested primary IP address (192.168.0.1). Please note, I changed the IP address and server name.

Nov  7 21:05:47 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:47,906: INFO/ForkPoolWorker-3] Authentication (password) successful!
Nov  7 21:05:49 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:49,820: INFO/ForkPoolWorker-3] COLLECT: device facts
Nov  7 21:05:52 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:52,240: INFO/ForkPoolWorker-3] COLLECT: device interface IPs
Nov  7 21:05:54 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:54,355: INFO/ForkPoolWorker-3] Could not find existing Nautobot device for requested primary IP address (192.168.0.1)
Nov  7 21:05:54 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:54,368: ERROR/ForkPoolWorker-3] Onboarding Error - Exception
Nov  7 21:05:54 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:54,368: ERROR/ForkPoolWorker-3] cannot access local variable 'search_array_element' where it is not associated with a value
Nov  7 21:05:54 nautobot-prod nautobot-server[3251705]: [2023-11-07 20:05:54,378: INFO/ForkPoolWorker-3] Task nautobot_device_onboarding.worker.onboard_device_worker[00fd968d-aa61-44a5-ac03-2558a62b21a7] succeeded in 17.838233768939972s: {'ok': False}

We don't have any configurations for nautobot_device_onboarding yet in PLUGINS_CONFIG for all the instances.

Please let me know if you need more information. Thank you,

nathanielfernandez avatar Nov 15 '23 01:11 nathanielfernandez

Just an update. I updated nautobot to 2.0.5 and device onboarding version from 2.0.2 to 3.0.1 in order to address the XSS vulnerability. I still got the same error in the return value when I ran the onboarding job..

{'exc_type': 'UnboundLocalError', 'exc_module': 'builtins', 'exc_message': ["cannot access local variable 'search_array_element' where it is not associated with a value"]}

image

nathanielfernandez avatar Nov 30 '23 05:11 nathanielfernandez

Upon checking /var/log/messages on that job run, I was able to get more context in that error message. Looks like there are two device types being returned.

image

nathanielfernandez avatar Nov 30 '23 05:11 nathanielfernandez

Would you be able to share the device types that may match the type of device you are onboarding?

scetron avatar Nov 30 '23 13:11 scetron

The majority of the device types that was failing to onboard on our prod instance was a Cisco ASR1001-X Router. Actually I don't see an onboarded device having that type. I saw ASR1001 and ASR1001-HX device types and but there are no devices with type ASR1001-X.

Another example of device types having issues onboarding is ws-3650-48d. Other variations of 3650 like ws-c3650-48tq were onboarded with no issues.

nathanielfernandez avatar Dec 01 '23 00:12 nathanielfernandez

You may want to check if there is a device type where the search array would match two different devices. From what I understand, it's looking for model name and part number of the device type to determine if there's an existing device type to select. Code where that is happening is here. I would also like to see this part of the matching of device type/manufacturer rewritten to be a bit more robust. I have had issues in my instance related to manufacturer. The onboarding plugin will extract the vendor name from the device if the manufacturer is not set and a matching device type with the union of manufacturer and device type already exist. This causes a failure to onboard in my current environment as we use "Cisco Systems, Inc" - onboarding tries to create same device type/parts number with "Cisco" and it bombs for duplicate keys.

susanhooks avatar Dec 06 '23 23:12 susanhooks

@susanhooks thanks for your suggestion. I tried to look at the code but I don't know if how it works fully. But since it mentioned ensure_device_type as one of the methods, it might be trying to make sure the device type is there. So I tried to add a device type for CiscoASR1001-X in the gui (asr1001-x). I just copied the model name from our other Nautobot instances. After that, an ASR1001-X device has been onboarded successfully.

nathanielfernandez avatar Dec 12 '23 03:12 nathanielfernandez

I think we're good to close this one. Just fyi, I would probably need to raise a separate issue in nautobot/nautobot. I get a different error for some devices that has been attempted to be onboarded in the past whose IP address was already listed in Nautobot. Here is the return value:

{'exc_type': 'IntegrityError', 'exc_module': 'django.db.utils', 'exc_message': ['duplicate key value violates unique constraint "ipam_prefix_namespace_id_network_prefix_length_b2dd8b57_uniq"\nDETAIL: Key (namespace_id, network, prefix_length)=(d39e97d3-41bc-4001-9a50-89330d2e67d0, \x0a122000, 24) already exists.\n']}

image

I tried to delete the prefix for that device in IPAM, but it does not work. I get a class 'KeyError' when I try to delete that IP.

image

nathanielfernandez avatar Dec 12 '23 04:12 nathanielfernandez

thank you for your feedback, I know there are some issues with this plugin since the 2.x update, specifically around device type, and I've seen around IPAM as well, and we are working on addressing them. Bug issues in the repo definitely help. :)

susanhooks avatar Dec 12 '23 04:12 susanhooks

I'm going to reopen this as I believe it is still an issue that may be encountered by others.

scetron avatar Dec 14 '23 13:12 scetron

just fyi, I was able to work around the onboarding error about prefix on my previous post. There seems to be a discrepancy in what we're seeing in the UI. The Prefix's Type says 'Network' but when you edit the prefix, it says 'Container'.

The workaround is to make sure the Prefix's Type field is set to Network and saving it. The onboarding works after that.

More info and screenshots on this thread in the #nautobot channel.

nathanielfernandez avatar Dec 15 '23 05:12 nathanielfernandez

As mentioned in the Slack thread, I'm currently pointing the finger at 'Network' as an incorrect value coming from somewhere - it should be PrefixTypeChoices.TYPE_NETWORK (== 'network') instead.

glennmatthews avatar Dec 15 '23 13:12 glennmatthews

This may have been recently fixed by @jeffkala, I will find the PR.

scetron avatar Dec 15 '23 14:12 scetron

I believe that the second issue with prefixes was fixed in #123, which I don't think has made it into a release just yet. I will double check that.

scetron avatar Dec 15 '23 14:12 scetron

Looks like a potential root cause has been found. Thanks guys. :) Do you still need any output from my end? If not, I'll apply the workaround on the prefixes with issues on our prod instance.

nathanielfernandez avatar Dec 20 '23 01:12 nathanielfernandez

I may have some clue on why the "Network" bug is showing up that I want to share. We have some devices that uses the wrong subnet for management. For example, the correct subnet for the management vlan is /23, but the network device mgt is configured with /24. That results to some devices with no ip address in nautobot while onboarding. And a couple of prefixes would show up with one as container type ("Network" under the hood) and another one with network type.

We could test this on one of our new instances to be sure. I'll share the results when we get there.

nathanielfernandez avatar Jan 30 '24 13:01 nathanielfernandez

Please disregard my previous message, we had a lot of those prefixes that uses "Network". Updated them in nbshell to use "network" type.

image

Looks like this issue did not occur when my teammate built a nautobot instance a few weeks back on version 2.1.1, device onboarding version 3.0.1.

nathanielfernandez avatar Mar 06 '24 01:03 nathanielfernandez

Thanks @nathanielfernandez . Will close this issue for now as it looks like we've been able to get by it.

scetron avatar Jun 11 '24 17:06 scetron