featureprofiles icon indicating copy to clipboard operation
featureprofiles copied to clipboard

Adding ACL-1.3: Large Scale ACL with TCAM profile

Open ASHNA-AGGARWAL-KEYSIGHT opened this issue 6 months ago • 22 comments

Readme Location: https://github.com/openconfig/featureprofiles/blob/main/feature/acl/otg_tests/acl_large_scale/README.md

Have raised an issue below for the addition of deviation in the script: https://partnerissuetracker.corp.google.com/issues/422165468 https://partnerissuetracker.corp.google.com/issues/423896542

For other issues: https://partnerissuetracker.corp.google.com/issues/416164360

Logs attached: https://partnerissuetracker.corp.google.com/issues/415458482

ASHNA-AGGARWAL-KEYSIGHT avatar Jun 18 '25 11:06 ASHNA-AGGARWAL-KEYSIGHT

Pull Request Functional Test Report for #4306 / f9428bc9c710c5b55aaf30d8f77be39c41c84953

Virtual Devices

Device Test Test Documentation Job Raw Log
Arista cEOS status
ACL-1.3: Large Scale ACL with TCAM profile
Cisco 8000E status
ACL-1.3: Large Scale ACL with TCAM profile
Cisco XRd status
ACL-1.3: Large Scale ACL with TCAM profile
Juniper ncPTX status
ACL-1.3: Large Scale ACL with TCAM profile
Nokia SR Linux status
ACL-1.3: Large Scale ACL with TCAM profile
Openconfig Lemming status
ACL-1.3: Large Scale ACL with TCAM profile

Hardware Devices

Device Test Test Documentation Raw Log
Arista 7808 status
ACL-1.3: Large Scale ACL with TCAM profile
Cisco 8808 status
ACL-1.3: Large Scale ACL with TCAM profile
Juniper PTX10008 status
ACL-1.3: Large Scale ACL with TCAM profile
Nokia 7250 IXR-10e status
ACL-1.3: Large Scale ACL with TCAM profile

Help

OpenConfigBot avatar Jun 18 '25 11:06 OpenConfigBot

Pull Request Test Coverage Report for Build 20102132183

Details

  • 0 of 98 (0.0%) changed or added relevant lines in 4 files are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.01%) to 10.034%

Changes Missing Coverage Covered Lines Changed/Added Lines %
internal/deviations/deviations.go 0 9 0.0%
internal/cfgplugins/bgp.go 0 16 0.0%
proto/metadata_go_proto/metadata.pb.go 0 33 0.0%
internal/cfgplugins/policyforwarding.go 0 40 0.0%
<!-- Total: 0 98
Files with Coverage Reduction New Missed Lines %
proto/metadata_go_proto/metadata.pb.go 4 0.0%
<!-- Total: 4
Totals Coverage Status
Change from base Build 20090883156: -0.01%
Covered Lines: 2227
Relevant Lines: 22195

💛 - Coveralls

coveralls avatar Jun 19 '25 02:06 coveralls

@ASHNA-AGGARWAL-KEYSIGHT - Can you please fix the CI/CD failures "Go/static analysis failures" so it can be validated?

ram-mac avatar Aug 20 '25 06:08 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - Can you please fix the CI/CD failures "Go/static analysis failures" so it can be validated?

Fixed the issues

ASHNA-AGGARWAL-KEYSIGHT avatar Aug 20 '25 12:08 ASHNA-AGGARWAL-KEYSIGHT

@ASHNA-AGGARWAL-KEYSIGHT - Can you please fix the CI/CD failures "Go/static analysis failures" so it can be validated?

Fixed the issues

I see the issue still there.. the ci/cd checks are not yet passing fully.. Go / test (pull_request) is failing

ram-mac avatar Aug 21 '25 02:08 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - Validation has failed on google environment and i have attached the test logs to the bug https://partnerissuetracker.corp.google.com/issues/415458482; There are couple of issues.

--- FAIL: TestAclLargeScale (6135.16s)
    --- PASS: TestAclLargeScale/ACL-1.1.1_-_ACL_IPv4_Address_scale (2167.23s)
    --- PASS: TestAclLargeScale/ACL-1.1.2_-_ACL_IPv6_Address_scale (2193.16s)
    --- FAIL: TestAclLargeScale/ACL-1.2.1_-_ACL_IPv4_Address_scale_using_prefix-list (878.78s)
    --- FAIL: TestAclLargeScale/ACL-1.2.2_-_ACL_IPv6_Address_scale_using_prefix-list (863.79s)
  1. The test with prefix-lists are failing for ipv4, ipv6 address family
  2. The IPv4 access lists has only "permit ip any any" which should not be the case.. We really need to have some ip configured. Have attached the access list it is created on the device while running the test to the bug.
  3. The test is taking too long to complete. This needs to be debugged. How much time it took for you to run end to end.

ram-mac avatar Aug 21 '25 05:08 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - Validation has failed on google environment and i have attached the test logs to the bug https://partnerissuetracker.corp.google.com/issues/415458482; There are couple of issues.

--- FAIL: TestAclLargeScale (6135.16s)
    --- PASS: TestAclLargeScale/ACL-1.1.1_-_ACL_IPv4_Address_scale (2167.23s)
    --- PASS: TestAclLargeScale/ACL-1.1.2_-_ACL_IPv6_Address_scale (2193.16s)
    --- FAIL: TestAclLargeScale/ACL-1.2.1_-_ACL_IPv4_Address_scale_using_prefix-list (878.78s)
    --- FAIL: TestAclLargeScale/ACL-1.2.2_-_ACL_IPv6_Address_scale_using_prefix-list (863.79s)
  1. The test with prefix-lists are failing for ipv4, ipv6 address family
  2. The IPv4 access lists has only "permit ip any any" which should not be the case.. We really need to have some ip configured. Have attached the access list it is created on the device while running the test to the bug.
  3. The test is taking too long to complete. This needs to be debugged. How much time it took for you to run end to end.

@ram-mac could you please share which Arista image you have used, as in my setup, I am not seeing the "CLI error msg". For 2 & 3, I will get back to you. Need to check how we can reduce the runtime

ASHNA-AGGARWAL-KEYSIGHT avatar Aug 26 '25 17:08 ASHNA-AGGARWAL-KEYSIGHT

@ASHNA-AGGARWAL-KEYSIGHT - Validation has failed on google environment and i have attached the test logs to the bug https://partnerissuetracker.corp.google.com/issues/415458482; There are couple of issues.

--- FAIL: TestAclLargeScale (6135.16s)
    --- PASS: TestAclLargeScale/ACL-1.1.1_-_ACL_IPv4_Address_scale (2167.23s)
    --- PASS: TestAclLargeScale/ACL-1.1.2_-_ACL_IPv6_Address_scale (2193.16s)
    --- FAIL: TestAclLargeScale/ACL-1.2.1_-_ACL_IPv4_Address_scale_using_prefix-list (878.78s)
    --- FAIL: TestAclLargeScale/ACL-1.2.2_-_ACL_IPv6_Address_scale_using_prefix-list (863.79s)
  1. The test with prefix-lists are failing for ipv4, ipv6 address family
  2. The IPv4 access lists has only "permit ip any any" which should not be the case.. We really need to have some ip configured. Have attached the access list it is created on the device while running the test to the bug.
  3. The test is taking too long to complete. This needs to be debugged. How much time it took for you to run end to end.

@ram-mac could you please share which Arista image you have used, as in my setup, I am not seeing the "CLI error msg". For 2 & 3, I will get back to you. Need to check how we can reduce the runtime

@ASHNA-AGGARWAL-KEYSIGHT - I have attached the logs to the bug415458482, you can check the version and other details from there.

ram-mac avatar Aug 28 '25 04:08 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - Validation has failed on google environment and i have attached the test logs to the bug https://partnerissuetracker.corp.google.com/issues/415458482; There are couple of issues.

--- FAIL: TestAclLargeScale (6135.16s)
    --- PASS: TestAclLargeScale/ACL-1.1.1_-_ACL_IPv4_Address_scale (2167.23s)
    --- PASS: TestAclLargeScale/ACL-1.1.2_-_ACL_IPv6_Address_scale (2193.16s)
    --- FAIL: TestAclLargeScale/ACL-1.2.1_-_ACL_IPv4_Address_scale_using_prefix-list (878.78s)
    --- FAIL: TestAclLargeScale/ACL-1.2.2_-_ACL_IPv6_Address_scale_using_prefix-list (863.79s)
  1. The test with prefix-lists are failing for ipv4, ipv6 address family
  2. The IPv4 access lists has only "permit ip any any" which should not be the case.. We really need to have some ip configured. Have attached the access list it is created on the device while running the test to the bug.
  3. The test is taking too long to complete. This needs to be debugged. How much time it took for you to run end to end.

@ram-mac could you please share which Arista image you have used, as in my setup, I am not seeing the "CLI error msg". For 2 & 3, I will get back to you. Need to check how we can reduce the runtime

Hi Ram, Could you please let me know what TCAM profile you are using? Since with traffic policy is enabled in my TCAM, I can apply it to the interface, not facing the issue "Failed to apply policy on Ethernet2/1"

interface Ethernet1/1 description DUT to ATE Port1 traffic-policy input ACL_IPV4_Match_using_prefix_list_prfxv4-1

Also, could you please let me know about the "ACL_IPV4_Match_high_scale_statements", in which the IP should be configured from one of the prefix blocks?

ASHNA-AGGARWAL-KEYSIGHT avatar Sep 04 '25 07:09 ASHNA-AGGARWAL-KEYSIGHT

@ASHNA-AGGARWAL-KEYSIGHT: The TCAM profile used is as below. Can you share which TCAM profile is in use?

TCAM PROFILE: hardware counter feature ecn out hardware counter feature ip out layer3 hardware counter feature ip in layer3 ! hardware access-list mechanism tcam !

ram-mac avatar Sep 09 '25 13:09 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT: The TCAM profile used is as below. Can you share which TCAM profile is in use?

TCAM PROFILE: hardware counter feature ecn out hardware counter feature ip out layer3 hardware counter feature ip in layer3 ! hardware access-list mechanism tcam !

Attached the tcam profile fp_config_tcam.txt

ASHNA-AGGARWAL-KEYSIGHT avatar Sep 10 '25 05:09 ASHNA-AGGARWAL-KEYSIGHT

@ASHNA-AGGARWAL-KEYSIGHT: The TCAM profile used is as below. Can you share which TCAM profile is in use? TCAM PROFILE: hardware counter feature ecn out hardware counter feature ip out layer3 hardware counter feature ip in layer3 ! hardware access-list mechanism tcam !

Attached the tcam profile fp_config_tcam.txt

Ok, i ran the test and it again fails with the same issue;

--- FAIL: TestAclLargeScale (5575.62s) --- PASS: TestAclLargeScale/ACL-1.1.1_-ACL_IPv4_Address_scale (1888.81s) --- PASS: TestAclLargeScale/ACL-1.1.2-ACL_IPv6_Address_scale (1910.71s) --- FAIL: TestAclLargeScale/ACL-1.2.1-ACL_IPv4_Address_scale_using_prefix-list (879.02s) --- FAIL: TestAclLargeScale/ACL-1.2.2-_ACL_IPv6_Address_scale_using_prefix-list (865.28s)

The tcam profile attached here has a lot of features enabled other than the ACL ones. We need to figure out which one is the right one to be added and then add the configuration to the test itself via CLI. I think after you add the TCAM profiles you also need to restart the device to take it into effect. Can we add these changes to this PR and let me know, so i can validate it once again.

ram-mac avatar Sep 10 '25 07:09 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT: The TCAM profile used is as below. Can you share which TCAM profile is in use? TCAM PROFILE: hardware counter feature ecn out hardware counter feature ip out layer3 hardware counter feature ip in layer3 ! hardware access-list mechanism tcam !

Attached the tcam profile fp_config_tcam.txt

Ok, i ran the test and it again fails with the same issue;

--- FAIL: TestAclLargeScale (5575.62s) --- PASS: TestAclLargeScale/ACL-1.1.1_-ACL_IPv4_Address_scale (1888.81s) --- PASS: TestAclLargeScale/ACL-1.1.2-ACL_IPv6_Address_scale (1910.71s) --- FAIL: TestAclLargeScale/ACL-1.2.1-ACL_IPv4_Address_scale_using_prefix-list (879.02s) --- FAIL: TestAclLargeScale/ACL-1.2.2-_ACL_IPv6_Address_scale_using_prefix-list (865.28s)

The tcam profile attached here has a lot of features enabled other than the ACL ones. We need to figure out which one is the right one to be added and then add the configuration to the test itself via CLI. I think after you add the TCAM profiles you also need to restart the device to take it into effect. Can we add these changes to this PR and let me know, so i can validate it once again.

Have added the changes in the PR

ASHNA-AGGARWAL-KEYSIGHT avatar Sep 18 '25 12:09 ASHNA-AGGARWAL-KEYSIGHT

@ASHNA-AGGARWAL-KEYSIGHT: The TCAM profile used is as below. Can you share which TCAM profile is in use? TCAM PROFILE: hardware counter feature ecn out hardware counter feature ip out layer3 hardware counter feature ip in layer3 ! hardware access-list mechanism tcam !

Attached the tcam profile fp_config_tcam.txt

Ok, i ran the test and it again fails with the same issue; --- FAIL: TestAclLargeScale (5575.62s) --- PASS: TestAclLargeScale/ACL-1.1.1_-ACL_IPv4_Address_scale (1888.81s) --- PASS: TestAclLargeScale/ACL-1.1.2-ACL_IPv6_Address_scale (1910.71s) --- FAIL: TestAclLargeScale/ACL-1.2.1-ACL_IPv4_Address_scale_using_prefix-list (879.02s) --- FAIL: TestAclLargeScale/ACL-1.2.2-_ACL_IPv6_Address_scale_using_prefix-list (865.28s) The tcam profile attached here has a lot of features enabled other than the ACL ones. We need to figure out which one is the right one to be added and then add the configuration to the test itself via CLI. I think after you add the TCAM profiles you also need to restart the device to take it into effect. Can we add these changes to this PR and let me know, so i can validate it once again.

Have added the changes in the PR

@ASHNA-AGGARWAL-KEYSIGHT - Is the test passing with the new Changes? Please share the passlog with the changes. Also can you please resolve the conflicts for this PR?

ram-mac avatar Oct 08 '25 07:10 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT: The TCAM profile used is as below. Can you share which TCAM profile is in use? TCAM PROFILE: hardware counter feature ecn out hardware counter feature ip out layer3 hardware counter feature ip in layer3 ! hardware access-list mechanism tcam !

Attached the tcam profile fp_config_tcam.txt

Ok, i ran the test and it again fails with the same issue; --- FAIL: TestAclLargeScale (5575.62s) --- PASS: TestAclLargeScale/ACL-1.1.1_-ACL_IPv4_Address_scale (1888.81s) --- PASS: TestAclLargeScale/ACL-1.1.2-ACL_IPv6_Address_scale (1910.71s) --- FAIL: TestAclLargeScale/ACL-1.2.1-ACL_IPv4_Address_scale_using_prefix-list (879.02s) --- FAIL: TestAclLargeScale/ACL-1.2.2-_ACL_IPv6_Address_scale_using_prefix-list (865.28s) The tcam profile attached here has a lot of features enabled other than the ACL ones. We need to figure out which one is the right one to be added and then add the configuration to the test itself via CLI. I think after you add the TCAM profiles you also need to restart the device to take it into effect. Can we add these changes to this PR and let me know, so i can validate it once again.

Have added the changes in the PR

@ASHNA-AGGARWAL-KEYSIGHT - Is the test passing with the new Changes? Please share the passlog with the changes. Also can you please resolve the conflicts for this PR?

Logs location(latestLogsACL1.3): https://partnerissuetracker.corp.google.com/issues/415458482

ASHNA-AGGARWAL-KEYSIGHT avatar Oct 09 '25 04:10 ASHNA-AGGARWAL-KEYSIGHT

@ASHNA-AGGARWAL-KEYSIGHT - With the latest changes looks like we are loosing connectivity and also gnmi connectivity with the device. I think we will have to check out why connectivity is getting lost. Logs attached here https://partnerissuetracker.corp.google.com/issues/415458482#91

ram-mac avatar Oct 16 '25 15:10 ram-mac

https://partnerissuetracker.corp.google.com/issues/415458482#91

@ram-mac Are we enabling the timeout while executing the test?

ASHNA-AGGARWAL-KEYSIGHT avatar Oct 22 '25 12:10 ASHNA-AGGARWAL-KEYSIGHT

https://partnerissuetracker.corp.google.com/issues/415458482#91

@ram-mac Are we enabling the timeout while executing the test?

@ASHNA-AGGARWAL-KEYSIGHT - Yes, i have given the timeout of 2hours for the test to run. But i suspect the gnmi connectivity is getting effected due to new changes; Earlier at least the connectivity was stable throughout the test.

ram-mac avatar Oct 27 '25 03:10 ram-mac

https://partnerissuetracker.corp.google.com/issues/415458482#91

@ram-mac Are we enabling the timeout while executing the test?

@ASHNA-AGGARWAL-KEYSIGHT - Yes, i have given the timeout of 2hours for the test to run. But i suspect the gnmi connectivity is getting effected due to new changes; Earlier at least the connectivity was stable throughout the test.

Hi Ram,

I have run the scripts multiple times in our setup without encountering any issues. From the attached logs, I noticed that GNMI disconnected after the CLI configuration. However, in my tests, I utilised the existing CLI helper function for configuration. I suspect that the issue may occur when you run a test from your internal framework, which could be causing the connection loss to the DUT. We execute tests directly from the feature profiles.

I suggest we schedule a call so that I can demonstrate how I executed the test, and we can discuss this further..

ASHNA-AGGARWAL-KEYSIGHT avatar Oct 29 '25 10:10 ASHNA-AGGARWAL-KEYSIGHT

https://partnerissuetracker.corp.google.com/issues/415458482#91

@ram-mac Are we enabling the timeout while executing the test?

@ASHNA-AGGARWAL-KEYSIGHT - Yes, i have given the timeout of 2hours for the test to run. But i suspect the gnmi connectivity is getting effected due to new changes; Earlier at least the connectivity was stable throughout the test.

Hi Ram,

I have run the scripts multiple times in our setup without encountering any issues. From the attached logs, I noticed that GNMI disconnected after the CLI configuration. However, in my tests, I utilised the existing CLI helper function for configuration. I suspect that the issue may occur when you run a test from your internal framework, which could be causing the connection loss to the DUT. We execute tests directly from the feature profiles.

I suggest we schedule a call so that I can demonstrate how I executed the test, and we can discuss this further..

@ASHNA-AGGARWAL-KEYSIGHT - I know that your environment is different, but there is definitely some issue with the ACL's being configured causing the gnmi connectivity to be lost. Also, I had pointed out another issue where there is delay in applying the configuration. Lets have a call sometime tomorrow

ram-mac avatar Oct 29 '25 12:10 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - I have sent invite to debug this issue on our setup. Lets identify the issue today.

ram-mac avatar Nov 06 '25 01:11 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - I have sent invite to debug this issue on our setup. Lets identify the issue today.

@ram-mac as discussed, added the changes and rebased the PR

ASHNA-AGGARWAL-KEYSIGHT avatar Nov 06 '25 11:11 ASHNA-AGGARWAL-KEYSIGHT

@ASHNA-AGGARWAL-KEYSIGHT - I have sent invite to debug this issue on our setup. Lets identify the issue today.

@ram-mac as discussed, added the changes and rebased the PR

Thanks @ASHNA-AGGARWAL-KEYSIGHT - I will validate it today and see if there is any progress.

ram-mac avatar Nov 10 '25 02:11 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - I have sent invite to debug this issue on our setup. Lets identify the issue today.

@ram-mac as discussed, added the changes and rebased the PR

Thanks @ASHNA-AGGARWAL-KEYSIGHT - I will validate it today and see if there is any progress.

@ASHNA-AGGARWAL-KEYSIGHT - The test fails again. I think the issue might be when we are applying the HW TCAM profile the gnmi client connectivity is getting lost. We can ask vendor also to help with the debug.

ram-mac avatar Nov 10 '25 13:11 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - I have sent invite to debug this issue on our setup. Lets identify the issue today.

@ram-mac as discussed, added the changes and rebased the PR

Thanks @ASHNA-AGGARWAL-KEYSIGHT - I will validate it today and see if there is any progress.

@ASHNA-AGGARWAL-KEYSIGHT - The test fails again. I think the issue might be when we are applying the HW TCAM profile the gnmi client connectivity is getting lost. We can ask vendor also to help with the debug.

@ASHNA-AGGARWAL-KEYSIGHT - This tcam configuration cannot be used to test in google environment. The port-channel interface stays down with this configuration and loosing the connectivity. Please remove it from the test. If there is failure we need to check with Arista on those failures please.

ram-mac avatar Nov 12 '25 10:11 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - Can you please make the necessary changes as per the discussion regarding the CLI based traffic-policy configuration generation and then run the test

ram-mac avatar Dec 04 '25 08:12 ram-mac

@ASHNA-AGGARWAL-KEYSIGHT - Can you please make the necessary changes as per the discussion regarding the CLI based traffic-policy configuration generation and then run the test

@ram-mac Added the changes. Logs attached: NewChangesACL1.3(https://partnerissuetracker.corp.google.com/issues/415458482) Please let me know if any other changes are required

ASHNA-AGGARWAL-KEYSIGHT avatar Dec 08 '25 05:12 ASHNA-AGGARWAL-KEYSIGHT