dellemc.enterprise_sonic
dellemc.enterprise_sonic copied to clipboard
[BUG]: sonic_vxlan flaps all VLAN/L2VNI mappings if `vrf_map` is specified and `state: overridden` is used
Bug Description
When the sonic_vxlan
module is used with state: overridden
and a config
dict containing both vlan_map
and vrf_map
, it will flap (delete and re-create) all VLAN to L2VNI mappings, even though these were correctly configured to begin with. This causes a severe service disruption.
The flapping behaviour goes away if state: replaced
or state: merged
is used instead, however in these cases the module still falsely reports there are changes required. I suspect that the two issues have identical (or at least related) root causes, so I describe both in the same bug report.
This does not happen if the config dict
does not contain vrf_map
.
Product Name
SONiC-OS-4.2.0-Enterprise_Base
Component or Module Name
sonic_vxlans
DellEMC Enterprise SONiC Ansible Collection Version
dellemc.enterprise_sonic 2.4.0
SONiC Software Version
4.2.0-Enterprise_Base
Configuration
CONFIG_FILE() = /home/debian/ansible/ansible.cfg
DEFAULT_HASH_BEHAVIOUR(/home/debian/ansible/ansible.cfg) = merge
DEFAULT_HOST_LIST(/home/debian/ansible/ansible.cfg) = ['/home/debian/ansible/hosts.yml']
DEFAULT_JINJA2_EXTENSIONS(/home/debian/ansible/ansible.cfg) = jinja2.ext.do
HOST_KEY_CHECKING(/home/debian/ansible/ansible.cfg) = False
INTERPRETER_PYTHON(/home/debian/ansible/ansible.cfg) = auto_silent
MAX_FILE_SIZE_FOR_DIFF(/home/debian/ansible/ansible.cfg) = 1048576
PERSISTENT_COMMAND_TIMEOUT(/home/debian/ansible/ansible.cfg) = 3000
Steps to Reproduce
- Start out with a playbook containing the following test case:
---
- hosts: sonic2
gather_facts: false
tasks:
- name: Create VLANs 10 and 20
dellemc.enterprise_sonic.sonic_vlans:
config:
- vlan_id: 10
- vlan_id: 20
- name: Create VRF twenty
dellemc.enterprise_sonic.sonic_vrfs:
config:
- name: Vrf_twenty
members:
interfaces:
- name: Vlan20
- name: Map VLAN 10 to L2VNI 10
loop: [0,1,2]
dellemc.enterprise_sonic.sonic_vxlans:
state: overridden
config:
- name: vtep1
evpn_nvo: nvo1
source_ip: 192.0.2.1
vlan_map:
- vni: 10
vlan: 10
- name: Additionally map Vrf_twenty to L3VNI 2020
loop: [0,1,2]
dellemc.enterprise_sonic.sonic_vxlans:
state: overridden
config:
- name: vtep1
evpn_nvo: nvo1
source_ip: 192.0.2.1
vlan_map:
- vni: 10
vlan: 10
vrf_map:
- vni: 2020
vrf: Vrf_twenty
- Run the playbook against an unconfigured switch.
Expected Behavior
- Only the first iteration of the
Map VLAN 10 to L2VNI 10
task should returnchanged:
, the subsequent ones should be idempotent and returnok:
. - Only the first iteration of the
Additionally map Vrf_twenty to L3VNI 2020
task should returnchanged:
, the subsequent ones should be idempotent and returnok:
. - No iteration of the
Additionally map Vrf_twenty to L3VNI 2020
should cause any change in state to the VLAN 10/L2VNI 10 mapping, as this part of theconfig:
dict is unchanged from the precedingMap VLAN 10 to L2VNI 10
task.
Actual Behavior
- Only the first iteration of the
Map VLAN 10 to L2VNI 10
task should returnchanged:
, the subsequent ones should be idempotent and returnok:
- this is as expected, and shows that the bug is dependent on the presence ofvrf_map
. - All three iterations of the
Additionally map Vrf_twenty to L3VNI 2020
task reportschanged:
. This is unexpected, as theconfig:
dict used does not change between the iterations. This also happens if the task is changed to usestate: replaced
orstate: merged
. - All three iterations of the
Additionally map Vrf_twenty to L3VNI 2020
task results in the deletion and re-addition of the VLAN 10 to L2VNI 10 mapping. This is unexpected, as this part of the config dict does not change from theMap VLAN 10 to L2VNI 10
task (or between individual iterations ofAdditionally map Vrf_twenty to L3VNI 2020
task for that matter). This caused a critical outage in our production network.
For what it is worth, the resulting configuration at the end of the playbook run appears to be correct:
$ sonic-cli -c 'show running-configuration interface vxlan'
!
interface vxlan vtep1
source-ip 192.0.2.1
qos-mode pipe dscp 0
map vni 10 vlan 10
map vni 2020 vrf Vrf_twenty
Logs
This is the console log from running the playbook:
debian@debian:~/ansible$ ansible-playbook -vD vxlan.yml
Using /home/debian/ansible/ansible.cfg as config file
PLAY [sonic2] ******************************************************************************************************
TASK [Create VLANs 10 and 20] **************************************************************************************
*** before
--- after
***************
*** 1 ****
! []
--- 1,10 ----
! [
! {
! 'description': null,
! 'vlan_id': 10
! },
! {
! 'description': null,
! 'vlan_id': 20
! }
! ]
changed: [sonic2] => {"after": [{"description": null, "vlan_id": 10}, {"description": null, "vlan_id": 20}], "before": [], "changed": true, "commands": [{"state": "merged", "vlan_id": 10}, {"state": "merged", "vlan_id": 20}]}
TASK [Create VRF twenty] *******************************************************************************************
*** before
--- after
***************
*** 1,4 ****
--- 1,14 ----
[
+ {
+ 'members': {
+ 'interfaces': [
+ {
+ 'name': 'Vlan20'
+ }
+ ]
+ },
+ 'name': 'Vrf_twenty'
+ },
{
'members': null,
'name': 'mgmt'
changed: [sonic2] => {"after": [{"members": {"interfaces": [{"name": "Vlan20"}]}, "name": "Vrf_twenty"}, {"members": null, "name": "mgmt"}], "before": [{"members": null, "name": "mgmt"}], "changed": true, "commands": [{"members": {"interfaces": [{"name": "Vlan20"}]}, "name": "Vrf_twenty", "state": "merged"}]}
TASK [Map VLAN 10 to L2VNI 10] *************************************************************************************
changed: [sonic2] => (item=0) => {"after": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "ansible_loop_var": "item", "before": [], "changed": true, "commands": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "state": "overridden", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "item": 0}
ok: [sonic2] => (item=1) => {"ansible_loop_var": "item", "before": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "changed": false, "commands": [], "item": 1}
ok: [sonic2] => (item=2) => {"ansible_loop_var": "item", "before": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "changed": false, "commands": [], "item": 2}
TASK [Additionally map Vrf_twenty to L3VNI 2020] *******************************************************************
changed: [sonic2] => (item=0) => {"after": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "ansible_loop_var": "item", "before": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "changed": true, "commands": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "state": "overridden", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": [{"vni": 2020, "vrf": "Vrf_twenty"}]}], "item": 0}
changed: [sonic2] => (item=1) => {"after": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "ansible_loop_var": "item", "before": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "changed": true, "commands": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "state": "overridden", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": [{"vni": 2020, "vrf": "Vrf_twenty"}]}], "item": 1}
changed: [sonic2] => (item=2) => {"after": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "ansible_loop_var": "item", "before": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "changed": true, "commands": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "state": "overridden", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": [{"vni": 2020, "vrf": "Vrf_twenty"}]}], "item": 2}
PLAY RECAP *********************************************************************************************************
sonic2 : ok=4 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
A single run of the Additionally map Vrf_twenty to L3VNI 2020
task yields the following relevant output logged to /var/log/ramfs/in-memory-syslog-info.log
, of particular interest are the DELETE
calls:
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-72] User "[email protected]:45032" request "GET /restconf/data/sonic-vxlan:sonic-vxlan" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-73] User "[email protected]:45046" request "GET /restconf/data/sonic-vxlan:sonic-vxlan/EVPN_NVO/EVPN_NVO_LIST" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-74] User "[email protected]:45062" request "GET /restconf/data/sonic-vrf:sonic-vrf/VRF/VRF_LIST" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-75] User "[email protected]:45068" request "DELETE /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL_MAP/VXLAN_TUNNEL_MAP_LIST=vtep1,map_10_Vlan10" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-76] User "[email protected]:45070" request "DELETE /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL/VXLAN_TUNNEL_LIST=vtep1/src_ip" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-77] User "[email protected]:45076" request "DELETE /restconf/data/sonic-vxlan:sonic-vxlan/EVPN_NVO/EVPN_NVO_LIST=nvo1" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-78] User "[email protected]:38586" request "DELETE /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL/VXLAN_TUNNEL_LIST=vtep1" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-79] User "[email protected]:38590" request "PATCH /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-80] User "[email protected]:38600" request "PATCH /restconf/data/sonic-vxlan:sonic-vxlan/EVPN_NVO/EVPN_NVO_LIST" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-81] User "[email protected]:38606" request "PATCH /restconf/data/sonic-vxlan:sonic-vxlan/VXLAN_TUNNEL_MAP" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-82] User "[email protected]:38618" request "PATCH /restconf/data/sonic-vrf:sonic-vrf/VRF/VRF_LIST=Vrf_twenty/vni" status - 204
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-83] User "[email protected]:38630" request "GET /restconf/data/sonic-vxlan:sonic-vxlan" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-84] User "[email protected]:38646" request "GET /restconf/data/sonic-vxlan:sonic-vxlan/EVPN_NVO/EVPN_NVO_LIST" status - 200
INFO mgmt-framework#/usr/sbin/rest_server[34]: [REST-85] User "[email protected]:38654" request "GET /restconf/data/sonic-vrf:sonic-vrf/VRF/VRF_LIST" status - 200
During the above run, the following was logged by a running ip montor link
session:
149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 qdisc noqueue master Bridge state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
Deleted 149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
76: Bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UP group default event FEATURE CHANGE
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
Deleted 149: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 qdisc noop state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master Bridge state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master Bridge state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
76: Bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9216 qdisc noqueue state UP group default event FEATURE CHANGE
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST> mtu 1500 master Bridge state DOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST> mtu 9100 qdisc noop master Bridge state DOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 qdisc noqueue master Bridge state UNKNOWN group default
link/ether 0c:eb:33:95:00:49 brd ff:ff:ff:ff:ff:ff
150: vtep1-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 master Bridge state UNKNOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 master Bridge state UNKNOWN
link/ether 0c:eb:33:95:00:49
150: vtep1-10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9100 master Bridge state UNKNOWN
link/ether 0c:eb:33:95:00:49
Screenshots
No response
Additional Information
Identical behaviour is observed with dellemc.enterprise_sonic 2.2.0
I just noticed another bug which seems part of this overall issue.
If I start out with the configuration created by the test playbook described above, that is:
interface vxlan vtep1
source-ip 192.0.2.1
qos-mode pipe dscp 0
map vni 10 vlan 10
map vni 2020 vrf Vrf_twenty
And then apply only the Map VLAN 10 to L2VNI 10
task with either state: overridden
or state: replaced
, the result is:
TASK [Map VLAN 10 to L2VNI 10] *************************************************************************************
ok: [sonic2] => {"before": [{"evpn_nvo": "nvo1", "name": "vtep1", "primary_ip": null, "source_ip": "192.0.2.1", "vlan_map": [{"vlan": 10, "vni": 10}], "vrf_map": null}], "changed": false, "commands": []}
The expected behaviour here is the removal of the map vni 2020 vrf Vrf_twenty
(as this mapping does not appear in the config:
dict passed to this task), but this does not happen at all - it is left intact. Instead, the task behaves how I would have expected it to behave had state: merged
been specified..
Thank you for letting us know about this problem.
We are looking into it and will get back to you with the result of our analysis and our plan for addressing the problem.
The issue is fixed with pull-request: https://github.com/ansible-collections/dellemc.enterprise_sonic/pull/393