featureprofiles icon indicating copy to clipboard operation
featureprofiles copied to clipboard

Add Test : PF-2.3: Multiple VRFs and GUE DECAP in Default VRF

Open desaimg1 opened this issue 4 months ago • 4 comments

Readme : https://github.com/openconfig/featureprofiles/blob/main/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/README.md

The log is attached here: https://partnerissuetracker.corp.google.com/issues/415458482

desaimg1 avatar Sep 06 '25 14:09 desaimg1

Pull Request Functional Test Report for #4552 / 3de73c853884485c51a08fae4c5ab852fc311774

Virtual Devices

Device Test Test Documentation Job Raw Log
Arista cEOS status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF
Cisco 8000E status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF
Cisco XRd status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF
Juniper ncPTX status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF
Nokia SR Linux status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF
Openconfig Lemming status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF

Hardware Devices

Device Test Test Documentation Raw Log
Arista 7808 status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF
Cisco 8808 status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF
Juniper PTX10008 status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF
Nokia 7250 IXR-10e status
PF-2.3: Multiple VRFs and GUE DECAP in Default VRF

Help

OpenConfigBot avatar Sep 06 '25 14:09 OpenConfigBot

Pull Request Test Coverage Report for Build 19771224700

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 7 (0.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 10.112%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/cfgplugins/policyforwarding.go 0 7 0.0%
<!-- Total: 0 7
Totals Coverage Status
Change from base Build 19754862248: 0.0%
Covered Lines: 2227
Relevant Lines: 22023

💛 - Coveralls

coveralls avatar Sep 08 '25 13:09 coveralls

Initial review done

@nupkanoi addressed initial review. please check the latest log here : https://partnerissuetracker.corp.google.com/issues/415458482

desaimg1 avatar Sep 22 '25 18:09 desaimg1

2nd review - Left comments

Addressed the comments

desaimg1 avatar Sep 26 '25 14:09 desaimg1

Pre-merge Action: @ram-mac , Kindly execute these tests within our environment before merging. We can proceed with the merge upon successful test completion

ksgireesha avatar Nov 18 '25 04:11 ksgireesha

@desaimg1 - I have validated the test and the test is failing. Is there any TCAM profile used during the test? Please use the logs for reference.

https://partnerissuetracker.corp.google.com/issues/432056696#comment117

ram-mac avatar Nov 18 '25 10:11 ram-mac

https://partnerissuetracker.corp.google.com/issues/432056696#comment117

hi @ram-mac I am not able to open the log.. could u pls share the correct link..

desaimg1 avatar Nov 18 '25 11:11 desaimg1

@desaimg1 - Please find the log here, https://partnerissuetracker.corp.google.com/issues/415458482#comment117

ksgireesha avatar Nov 18 '25 13:11 ksgireesha

@desaimg1 - Please find the log here, https://partnerissuetracker.corp.google.com/issues/415458482#comment117

Hi @ram-mac , @ksgireesha I have executed again in our testbed and it is perfectly passing. Please find the log here : https://partnerissuetracker.corp.google.com/issues/415458482#comment118

I see route leaking is not happening in google environment. We are using TCAM profile in our environment also could you please copy all the required files in google environment and verify.. I guess metadata.proto file is not copied and due to that deviations are skipped and the test is failing.

google log snippet third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Route 198.51.100.1/32 was not leaked into B2_VRF unexpectedly third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Verifying leaked route 198.51.100.2/32 in B2_VRF third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Route 198.51.100.2/32 was not leaked into B2_VRF unexpectedly

desaimg1 avatar Nov 18 '25 16:11 desaimg1

@desaimg1 - Please find the log here, https://partnerissuetracker.corp.google.com/issues/415458482#comment117

Hi @ram-mac , @ksgireesha I have executed again in our testbed and it is perfectly passing. Please find the log here : https://partnerissuetracker.corp.google.com/issues/415458482#comment118

I see route leaking is not happening in google environment. We are using TCAM profile in our environment also could you please copy all the required files in google environment and verify.. I guess metadata.proto file is not copied and due to that deviations are skipped and the test is failing.

Any use of vendor specific hardware configurations must be included in _test.go by calling and/or updating the NewDUTHardwareInit function.

ref: https://github.com/openconfig/featureprofiles/blob/7965d4db43ab3669db817044aa82104803538378/internal/cfgplugins/dut_initialize.go#L424

dplore avatar Nov 18 '25 18:11 dplore

https://partnerissuetracker.corp.google.com/issues/415458482#comment118

added hardware tcam profile and re executed. Please find the latest log here. https://partnerissuetracker.corp.google.com/issues/415458482#comment119

desaimg1 avatar Nov 19 '25 06:11 desaimg1

@desaimg1 - Please find the log here, https://partnerissuetracker.corp.google.com/issues/415458482#comment117

Hi @ram-mac , @ksgireesha I have executed again in our testbed and it is perfectly passing. Please find the log here : https://partnerissuetracker.corp.google.com/issues/415458482#comment118

I see route leaking is not happening in google environment. We are using TCAM profile in our environment also could you please copy all the required files in google environment and verify.. I guess metadata.proto file is not copied and due to that deviations are skipped and the test is failing.

google log snippet third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Route 198.51.100.1/32 was not leaked into B2_VRF unexpectedly third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Verifying leaked route 198.51.100.2/32 in B2_VRF third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Route 198.51.100.2/32 was not leaked into B2_VRF unexpectedly

@desaimg1 - metadata.textproto has the right changes. See below; It is strange why this test is not passing in Google. Any specific configuration that you use in your device is something that need to be checked.

plan_id: "PF-2.3"
description: "Multiple VRFs and GUE DECAP in Default VRF"
testbed: TESTBED_DUT_ATE_2LINKS

platform_exceptions: {
  platform: {
    vendor: ARISTA
  }
  deviations: {
    interface_enabled: true
    default_network_instance: "default"
    gue_gre_decap_unsupported: true
    network_instance_import_export_policy_oc_unsuppored: true
  }
}

ram-mac avatar Nov 20 '25 09:11 ram-mac

@desaimg1 - Please find the log here, https://partnerissuetracker.corp.google.com/issues/415458482#comment117

Hi @ram-mac , @ksgireesha I have executed again in our testbed and it is perfectly passing. Please find the log here : https://partnerissuetracker.corp.google.com/issues/415458482#comment118 I see route leaking is not happening in google environment. We are using TCAM profile in our environment also could you please copy all the required files in google environment and verify.. I guess metadata.proto file is not copied and due to that deviations are skipped and the test is failing. google log snippet third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Route 198.51.100.1/32 was not leaked into B2_VRF unexpectedly third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Verifying leaked route 198.51.100.2/32 in B2_VRF third_party/openconfig/featureprofiles/feature/policy_forwarding/vrf_selection/otg_tests/multiple_vrfs_and_gue_decap/multiple_vrfs_and_gue_decap_test.go:163: Route 198.51.100.2/32 was not leaked into B2_VRF unexpectedly

@desaimg1 - metadata.textproto has the right changes. See below; It is strange why this test is not passing in Google. Any specific configuration that you use in your device is something that need to be checked.

plan_id: "PF-2.3"
description: "Multiple VRFs and GUE DECAP in Default VRF"
testbed: TESTBED_DUT_ATE_2LINKS

platform_exceptions: {
  platform: {
    vendor: ARISTA
  }
  deviations: {
    interface_enabled: true
    default_network_instance: "default"
    gue_gre_decap_unsupported: true
    network_instance_import_export_policy_oc_unsuppored: true
  }
}

Hi @ram-mac thanks for your time executing again. I have added more tcam profiles to the script now and committed it. also we are using following versions : keng-controller: 1.24.0-15

requesting to re-execute one last time from your DUT. If failing again we will update our DUT with your DUT image version and execute from our end. Thanks.

desaimg1 avatar Nov 20 '25 13:11 desaimg1

@desaimg1 - Will check and update you soon.

ram-mac avatar Nov 20 '25 14:11 ram-mac

@desaimg1 - I have validated with the recent changes and the test still fails. Based on my debugging i think there are 2 issues

  1. policy ALLOW() applied on the devices does not exist.
  2. since it is an ibgp, by default no routes are installed even if BGP is learning the routes. And hence the routes are not learnt on the device. see below for reference
xxxxxxxx#show ip route vrf all
VRF: default
Source Codes:
Gateway of last resort is not set
 C        xxxxxxxx
           directly connected, Loopback0
 C        192.0.2.0/30
           directly connected, Ethernet9/1
 C        192.0.2.4/30
           directly connected, Ethernet3/1

VRF: B2_VRF
Source Codes:
Gateway of last resort is not set
 C L      xxxxxxxx (source VRF default)
           directly connected, Loopback0 (egress VRF default)
 C L      192.0.2.0/30 (source VRF default)
           directly connected, Ethernet9/1 (egress VRF default)
 C L      192.0.2.4/30 (source VRF default)
           directly connected, Ethernet3/1 (egress VRF default)

VRF: MGMT
Source Codes:
Gateway of last resort is not set
VRF: mgmt
Source Codes:
Gateway of last resort:
 S        0.0.0.0/0 [1/0]
           via xxxxxxxx, Port-Channel2
 C        xxxxxxxx
           directly connected, Loopback100
 C        xxxxxxxx
           directly connected, Port-Channel2

xxxxxxxx#show bgp summary 
BGP summary information for VRF default
Router identifier 192.0.2.5, local AS number 65001
Neighbor             AS Session State AFI/SAFI                AFI/SAFI State   NLRI Rcd   NLRI Acc   NLRI Adv
----------- ----------- ------------- ----------------------- -------------- ---------- ---------- ----------
192.0.2.2         65001 Established   IPv4 Unicast            Negotiated              5          0          0
192.0.2.2         65001 Established   IPv6 Unicast            Negotiated              0          0          0
192.0.2.6         65003 Established   IPv4 Unicast            Negotiated              5          0          0
192.0.2.6         65003 Established   IPv6 Unicast            Negotiated              0          0          0
2001:db8::2       65001 Established   IPv4 Unicast            Negotiated              0          0          0
2001:db8::2       65001 Established   IPv6 Unicast            Negotiated              5          0          0
2001:db8::6       65003 Established   IPv4 Unicast            Negotiated              0          0          0
2001:db8::6       65003 Established   IPv6 Unicast            Negotiated              5          0          0

xxxxxxxx#show ip route vrf all 

VRF: default
Source Codes:
Gateway of last resort is not set
 C        xxxxxxxx
           directly connected, Loopback0
 C        192.0.2.0/30
           directly connected, Ethernet9/1
 C        192.0.2.4/30
           directly connected, Ethernet3/1

VRF: B2_VRF
Source Codes:
Gateway of last resort is not set
 C L      xxxxxxxx (source VRF default)
           directly connected, Loopback0 (egress VRF default)
 C L      192.0.2.0/30 (source VRF default)
           directly connected, Ethernet9/1 (egress VRF default)
 C L      192.0.2.4/30 (source VRF default)
           directly connected, Ethernet3/1 (egress VRF default)

VRF: MGMT
Source Codes:
Gateway of last resort is not set
VRF: mgmt
Source Codes:
Gateway of last resort:
 S        0.0.0.0/0 [1/0]
           via xxxxxxxx, Port-Channel2
 C        xxxxxxxx
           directly connected, Loopback100
 C        xxxxxxxx
           directly connected, Port-Channel2
xxxxxxxx# 
xxxxxxxx#show ip route 198.51.200.1
VRF: default
Source Codes:
Gateway of last resort is not set

Full tests logs and the device outputs are uploaded here https://partnerissuetracker.corp.google.com/issues/415458482

ram-mac avatar Nov 24 '25 08:11 ram-mac

xxxxxxxx#show ip route vrf all VRF: default Source Codes: Gateway of last resort is not set C xxxxxxxx directly connected, Loopback0 C 192.0.2.0/30 directly connected, Ethernet9/1 C 192.0.2.4/30 directly connected, Ethernet3/1

VRF: B2_VRF Source Codes: Gateway of last resort is not set C L xxxxxxxx (source VRF default) directly connected, Loopback0 (egress VRF default) C L 192.0.2.0/30 (source VRF default) directly connected, Ethernet9/1 (egress VRF default) C L 192.0.2.4/30 (source VRF default) directly connected, Ethernet3/1 (egress VRF default)

VRF: MGMT

Hi Ram, @ram-mac , I have again executed the script in my local testbed and it is passing. pls find the snap here. Can we add Arista team here and get it clarified. because with the following version it is passing. I am also updating my logs here https://partnerissuetracker.corp.google.com/issues/415458482

desaimg1 avatar Nov 24 '25 09:11 desaimg1

@desaimg1 - Problem is that the script is not configuring the policy properly. That is the reason why it is failing for us. Iam trying to understand how it is passing for you. Can you please attach the full configuration here ? https://partnerissuetracker.corp.google.com/issues/415458482

Lets have a call to sort it out. Can you ping me in chat?

ram-mac avatar Nov 25 '25 06:11 ram-mac

@desaimg1 - Problem is that the script is not configuring the policy properly. That is the reason why it is failing for us. Iam trying to understand how it is passing for you. Can you please attach the full configuration here ? https://partnerissuetracker.corp.google.com/issues/415458482

Lets have a call to sort it out. Can you ping me in chat?

Hi @ram-mac , as per our last conversation, you will be raising bug with Arsita. Could you please update here.

desaimg1 avatar Nov 27 '25 04:11 desaimg1

@desaimg1 - Problem is that the script is not configuring the policy properly. That is the reason why it is failing for us. Iam trying to understand how it is passing for you. Can you please attach the full configuration here ? https://partnerissuetracker.corp.google.com/issues/415458482 Lets have a call to sort it out. Can you ping me in chat?

Hi @ram-mac , as per our last conversation, you will be raising bug with Arsita. Could you please update here.

@desaimg1 - Yes, i have raised the bug with Arista and they confirmed there is a behavioral change in the two release. It is advised to use the current image.

Response from Arista as below

This routing policy change is expected. Route-maps now have been deprecated and have been replaced with RCF. The functionality should be the same. RCF is enabled in EOS-image2 and it explains why the following configuration exists:

Question from my Google to Arista

Why does it configures the route-map with a ALLOW() with an additional (). In the older versions it was configuring correctly as ALLOW. Which seems to be a issue too. Looking for explanation on this.

Case1: The test passes in "EOS-image1" because the prefixes are learnt/received by BGP and installed in the RIB/FIB. Case2: The test fails in "EOS-image2" because the prefixes are learnt/received by BGP but not installed in the RIB/FIB as seen in the logs attached.

In both the above cases, i can see the ROUTE-MAP is not configured on the device. My question is why there is a difference in the behavior in prefix learning. Is it like in case2 (which is a eBGP case) the prefixes are not installed without a route-map configured on the device. Looking for explanation on this.

Response from Arista for question above

This a matter of implementation. RCF models policies as programming language functions (this is apparent with manually written RCF text, not OpenConfig) and require () at the point of application. The OpenConfig policy-definition does not need to be defined with ().
A missing route-map is erroneously treated as a permit in EOS. RCF properly treats this as a deny, as it's a misconfiguration.

ram-mac avatar Nov 27 '25 05:11 ram-mac

@desaimg1 - Please can you use the image that google is using and rerun your test and you will see the test failing accordingly. The route-map policy needs to defined properly and only then we will be able to get the routes learnt in the device.

ram-mac avatar Nov 27 '25 05:11 ram-mac

@desaimg1 - Please can you use the image that google is using and rerun your test and you will see the test failing accordingly. The route-map policy needs to defined properly and only then we will be able to get the routes learnt in the device.

@ram-mac : Updated the test to map the route-map policy, can you please run the test and let me know if you are still facing the issue.

ANISH-GOTTAPU avatar Nov 28 '25 07:11 ANISH-GOTTAPU

@ANISH-GOTTAPU - The test is now passing with the recent changes

ram-mac avatar Nov 28 '25 08:11 ram-mac

@ANISH-GOTTAPU @desaimg1 - Regarding ip_guev1_static_decap_subnet_range, it is still failing. I have requested to make changes to the deviation such that the unsupported part is skipped under the deviation so that the test would pass

--- FAIL: TestIpGue1StaticDecapsulation (519.03s)
    --- FAIL: TestIpGue1StaticDecapsulation/PF-1.4.1:_GUE_Decapsulation_of_inner_IPv4_traffic_over_DECAP_subnet_range (95.08s)
    --- FAIL: TestIpGue1StaticDecapsulation/PF-1.4.2:_GUE_Decapsulation_of_inner_IPv6_traffic_over_DECAP_subnet_range (87.93s)
    --- FAIL: TestIpGue1StaticDecapsulation/PF-1.4.3:_GUE_Decapsulation_of_inner_IPv4_traffic_using_non-default_and_unconfigured_GUE_UDP_port_(Negative). (75.88s)
    --- FAIL: TestIpGue1StaticDecapsulation/PF-1.4.4:_GUE_Decapsulation_of_inner_IPv6_traffic_using_non-default_and_unconfigured_GUE_UDP_port_(Negative). (74.36s)
    --- PASS: TestIpGue1StaticDecapsulation/PF-1.4.5:_Inner_IPV4_GUE_Pass-through_(Negative) (86.37s)
    --- PASS: TestIpGue1StaticDecapsulation/PF-1.4.6:_Inner_IPV6_GUE_Pass-through_(Negative) (84.87s)

ram-mac avatar Nov 28 '25 09:11 ram-mac

I would recommend to split the tests such that at least the test that is passing and new ones could be merged? Now all the tests has to be passing before it could be merged.

ram-mac avatar Nov 28 '25 09:11 ram-mac

I would recommend to split the tests such that at least the test that is passing and new ones could be merged? Now all the tests has to be passing before it could be merged.

I have split the PR, this PR has only PF-2.3 related changes. The other PR https://github.com/openconfig/featureprofiles/pull/4865

ANISH-GOTTAPU avatar Nov 28 '25 18:11 ANISH-GOTTAPU

I would recommend to split the tests such that at least the test that is passing and new ones could be merged? Now all the tests has to be passing before it could be merged.

I have split the PR, this PR has only PF-2.3 related changes. The other PR #4865

Thanks, this will help to identify the issues separately and merge this test so that it is part of the workflow. Since this test is already validated, i have already approved this PR

ram-mac avatar Nov 29 '25 01:11 ram-mac