netbox-sync icon indicating copy to clipboard operation
netbox-sync copied to clipboard

IP re-assignment error

Open yaiqsa opened this issue 2 years ago • 1 comments

I was writing this issue which I found testing the solution for #220. While writing I came to the idea that it might be a NetBox issue, instead of a netbox-sync one. Before opening a new issue on the NetBox project I looked for potential duplicates, and you never believe it; my error is not a bug, but ✨expected behaviour✨ (check netbox-community/netbox/issues/9348). So netbox-sync will have to work around it, just as in my previous issue. I don't expect you to patch this any time soon, and I think I will handle this manually, but I wanted to document it nevertheless.

Enviroment

NetBox version: 3.3.1 netbox-sync version: development branch commit 6b31cd4

Error

I noticed that I get sync errors in the following situation:

  • I have VM-A which has an interface, with an ip-address assigned to it
    • The ip address is also the VM's primary ip.
  • Now VM-A is phased out (offline and orphaned in NetBox)
  • Later VM-B is created, which uses VM-A's old ip-address
    • Here the sync fails: it tries to PATCH the ip-address, assigning it to VM-B's interface.
      • I guess under the hood the ip is first un-assigned from VM-A's interface, and some sort of check is triggered, which complains that VM-A's primay IP is no longer assigned to it, cancelling the change. Accoring to the NetBox devs, this behaviour is expected and desired, to guard against inadvertent changes. However, netbox-sync doesn't know this - yet.

In the previously mentioned netbox issue (9348) is mentioned that this problem also applies to (non-VM) devices & interfaces.

Logs

(names and ip's are redacted)

2022-08-31 14:14:26,534 - INFO: Updating NetBox 'IP address' object '192.0.2.10/24' with data: {'assigned_object_type': 'virtualization.vminterface', 'assigned_object_id': 5}
2022-08-31 14:14:26,535 - DEBUG2: Sending PATCH to 'https://netbox.example/api/ipam/ip-addresses/3/' with data 'b'{"assigned_object_type": "virtualization.vminterface", "assigned_object_id": 5}''.
2022-08-31 14:14:26,610 - DEBUG2: Received HTTP Status 400.
2022-08-31 14:14:26,611 - ERROR: NetBox returned: PATCH /api/ipam/ip-addresses/3/ Bad Request
2022-08-31 14:14:26,611 - ERROR: NetBox returned body: {'interface': ['IP address is primary for virtualmachine VM-A but not assigned to it!']}
2022-08-31 14:14:26,611 - ERROR: Request Failed for IP address. Used data: {'assigned_object_type': 'virtualization.vminterface', 'assigned_object_id': 5}

Proposed fix

I didn't delve into netbox-sync's code yet, and don't know how it decides what objects to patch (handling not-globally-unique ips), but on a higher level I think this should happen:

  1. When an ip will be assigned to an interface, first check if the ip is already assigned to an interface.
    1. If so, check if the ip is the primary ip address for the device/vm behind this interface (handling primary ipv4 and ipv6)
      1. If so, remove the primary ip from the device/vm
  2. continue as usual

I do see potential for unwanted changes when Config > IPAM > Enforce global unique is disabled, and two devices have identical primary ip's. I don't know if this is handled at all in netbox-sync, but I guess any patch resolving this error should be tested for such a scenario.

yaiqsa avatar Aug 31 '22 13:08 yaiqsa

Wow, thank you for this high quality bug report, highly appreciated.

This indeed needs to be fixed within netbox-sync. No Idea how at the moment but will definitely have a look into it.

bb-Ricardo avatar Aug 31 '22 14:08 bb-Ricardo

Hey, I finally had time to look into this issue and I pushed a new commit to "development" branch.

I would highly appreciate if you could test this.

Thank you

bb-Ricardo avatar Dec 27 '22 23:12 bb-Ricardo

Hi, sorry for the slow response, busy times..

So I tested the scenario as described in the description above again, and I can't reproduce it anymore using the master branch. It is not patched, but the error changed

  • Netbox v3.4.2
  • netbox-sync v1.3.0, commit 7db3624f90484e15a03911112640763f73129781

The previous erros are not thrown anymore, but this one is:

WARNING: Current interface 'vNIC 1 (VM Network) (VM-A)' for IP '192.0.2.10/24' and this one 'vNIC 1 (VM Network) (VM-B)' are both enabled. IP assignment skipped because it is unclear which one is the correct one!

That error is thown here. I tried to find the problem myself, but I kind of got stuck at "data.enabled", I couldn't find what 'enabled' means in this context.

The same happens when I cherry-pick commit 32023911981c28dca967760fac097dbd66c9ea6b on top of the master, or when I use the current Developer branch (up to commit 1b349ff906da58ed8fbaa3bd6470fe8d56de02bd)

When I use the Developer branch, I also get this error, but I'm pretty sure it has nothing to do with this issue. Thats why I cherry picked only the one commit 😅 :

ERROR: This 'virtual machine interface' data structure does not contain the primary key 'name' got: {'virtual_machine': <NBVM instance 'VM-3' at 140674398447024>, 'mac_address': 'xx:xx:xx:xx:xx:xx', 'description': 'Network adapter 1 (VirtualE1000e) (vlan ID: 0)', 'enabled': False, 'mtu': 1500, 'mode': 'access'}

yaiqsa avatar Jan 11 '23 15:01 yaiqsa

Hi @yaiqsa,

Thank you for testing. Just out of curiosity. If the you get the warning on the first run, does it also appear during a second run?

The issue is that: if VM-A is parsed first and is the one which the IP has newly be assigned to, then the status of VM-B is still the one represented in NetBox as it hasn't been parsed yet.

This is still a pending bug I'm trying to find a way to fix in an elegant way.

The Error you mention makes me curious as it should not happen 😅

bb-Ricardo avatar Jan 11 '23 21:01 bb-Ricardo

Well actually, It seems to be cause by this commit: https://github.com/bb-Ricardo/netbox-sync/commit/1b349ff906da58ed8fbaa3bd6470fe8d56de02bd

bb-Ricardo avatar Jan 11 '23 21:01 bb-Ricardo

Hi, I just pushed another commit which should fix exactly this use case. Can you check out the development branch and test it again? Thank you.

bb-Ricardo avatar Jan 12 '23 14:01 bb-Ricardo

Hi, I just pushed another commit which should fix exactly this use case. Can you check out the development branch and test it again? Thank you.

It doesn't throw an error anymore about the duplicate IP. However, it doesn't move it to the new VM either. The IP remains assigned to the orphaned VM.

yaiqsa avatar Jan 16 '23 09:01 yaiqsa

can you send me the logs with -l DEBUG2. I would like to take a look at it to see. And maybe screenshot of NetBox of the old and new VM and the same from vCenter, if possible.

Thant you.

bb-Ricardo avatar Jan 16 '23 17:01 bb-Ricardo

Sending the entire DEBUG2 log would leak a lot of information about our network, so unfortunately I can't do that. However, this is the redacted part of the log that might be relevant:

2023-01-23 18:07:00,488 - DEBUG: vCenter returned 'X' virtual machines
2023-01-23 18:07:00,503 - DEBUG: Parsing vCenter VM: test-01
2023-01-23 18:07:00,589 - DEBUG2: Found default IPv4 gateway 192.0.2.1
2023-01-23 18:07:00,590 - DEBUG2: Parsing device VirtualVmxnet3: 00:00:00:00:00:52
2023-01-23 18:07:00,594 - DEBUG: IP address 'fe80::0001/64' for vNIC 1 (VM Network) is a link local address. Skipping.
2023-01-23 18:07:00,594 - DEBUG2: Trying to find a virtual machine based on the collected name, cluster, IP and MAC addresses
2023-01-23 18:07:00,619 - DEBUG2: Found a exact matching virtual machine object: test-01 (vmcluster-01)
2023-01-23 18:07:00,620 - DEBUG2: Found a matching virtual machine object: test-01 (vmcluster-01)
2023-01-23 18:07:00,620 - DEBUG2: Parsing 'virtual machine' data structure: test-01
2023-01-23 18:07:00,620 - DEBUG2: Parsing 'site' data structure: SITE NAME
2023-01-23 18:07:00,620 - DEBUG2: Parsing 'platform' data structure: Ubuntu Linux (64-bit)
2023-01-23 18:07:00,620 - DEBUG2: Found a matching vm_role_relation 'Server' (.*) for test-01
2023-01-23 18:07:00,620 - DEBUG2: Parsing 'virtual machine' data structure: test-01
2023-01-23 18:07:00,620 - DEBUG2: Parsing 'device role' data structure: Server
2023-01-23 18:07:00,620 - DEBUG2: Trying to match current object interfaces in NetBox with discovered interfaces
2023-01-23 18:07:00,623 - DEBUG2: Found '1' NICs in NetBox for 'test-01'
2023-01-23 18:07:00,623 - DEBUG2: Found 1:1 name match for NIC 'vNIC 1 (VM Network)'
2023-01-23 18:07:00,623 - DEBUG2: Parsing 'virtual machine interface' data structure: vNIC 1 (VM Network) (test-01)
2023-01-23 18:07:00,623 - DEBUG2: Trying to find prefix for IP: 192.0.2.10/24
2023-01-23 18:07:00,624 - DEBUG2: Found IP '192.0.2.10/24' matches site 'SITE NAME' prefix '192.0.2.0/24'
2023-01-23 18:07:00,628 - DEBUG: Current interface 'vNIC 1 (VM Network) (VM-A)' for IP '192.0.2.10/24' and this one 'vNIC 1 (VM Network) (VM-B)' are both enabled. The virtual machine will be checked later again to see if current interface status or association has changed
2023-01-23 18:07:00,631 - DEBUG2: Found matching prefix VLAN 1 (SITE NAME) for untagged interface VLAN.
2023-01-23 18:07:00,632 - DEBUG2: Parsing 'virtual machine interface' data structure: vNIC 1 (VM Network) (test-01)
...
2023-01-23 18:07:26,631 - DEBUG2: virtual machine 'VM-A' has IP '192.0.2.10/24' assigned but is in status {'value': 'offline', 'label': 'Offline'}. IP address will not marked as orphaned.

This block appears 3 times in the log.

As for the screenshots, what exactly do you want to see? The interfaces page?

yaiqsa avatar Jan 23 '23 17:01 yaiqsa

Hi, in this case it helps. I can see that I haven't accounted for this case: new VM get's IP of a powered off VM It seams this should be a simple case but it somehow didn't make it into the codebase.

I just pushed another change to the "development" branch. Can you test it and see if this resolves your issue?

thank you

bb-Ricardo avatar Jan 23 '23 20:01 bb-Ricardo

Okay! It now works :D

I did get one error regarding a moved IP that was still assigned as primary IP to an old vm, which didn't have that IP assigned to any interfaces. I'm not sure how that came to be, and it might have been a Netbox bug. I can't seem to reproduce that however, so I wouldn't do anything with this information.

I can now re-assign the IP to a new vm in vCenter, and netbox-sync handles it correctly.

Running development I do also get this error at the end, but I think it's not relevant for this issue:

ERROR: unable to close vCenter API instance connection: tag_session

So I think you can close this issue, thank you so much for your work!

yaiqsa avatar Jan 24 '23 09:01 yaiqsa

great, thank you very much. I've seen this API error and will fix it as well.

bb-Ricardo avatar Jan 24 '23 09:01 bb-Ricardo