sonic-mgmt icon indicating copy to clipboard operation
sonic-mgmt copied to clipboard

Add new test cases to verify route consistency before and after critical processes crash

Open BYGX-wcr opened this issue 1 year ago • 9 comments

Description of PR

Summary: Fixes # (issue) https://github.com/sonic-net/sonic-mgmt/issues/14983

Type of change

  • [ ] Bug fix
  • [ ] Testbed and Framework(new/improvement)
  • [ X] Test case(new/improvement)

Back port request

  • [ ] 202012
  • [ ] 202205
  • [ ] 202305
  • [ ] 202311
  • [X ] 202405

Approach

What is the motivation for this PR?

To address the test gap raised by the issue.

How did you do it?

I added 3 cases each of which tests the route consistency of the DUT before and after a type of critical processes crash. Now the test cover the following critical processes:

  • bgpd in bgp container
  • syncd in syncd container
  • orchagent in swss container

How did you verify/test it?

I verified it on t0, t1 and t2 testbeds in MSFT internal lab devices.

Any platform specific information?

Supported testbed topology if it's a new test case?

Any

Documentation

BYGX-wcr avatar Oct 24 '24 17:10 BYGX-wcr

The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results: trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing tests/route/test_route_consistency.py

fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/route/test_route_consistency.py:17:1: E302 expected 2 blank lines, found 1
tests/route/test_route_consistency.py:208:23: E711 comparison to None should be 'if cond is None:'
tests/route/test_route_consistency.py:226:23: E711 comparison to None should be 'if cond is None:'
tests/route/test_route_consistency.py:260:23: E711 comparison to None should be 'if cond is None:'
tests/route/test_route_consistency.py:307:23: E711 comparison to None should be 'if cond is None:'
...
[truncated extra lines, please run pre-commit locally to view full check results]

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

mssonicbld avatar Oct 24 '24 17:10 mssonicbld

The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results: trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing tests/route/test_route_consistency.py

fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Passed
flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

mssonicbld avatar Oct 24 '24 20:10 mssonicbld

@deepak-singhal0408 , can you please spend some time reviewing this PR?

BYGX-wcr avatar Oct 31 '24 17:10 BYGX-wcr

Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?

deepak-singhal0408 avatar Oct 31 '24 18:10 deepak-singhal0408

Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?

What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.

BYGX-wcr avatar Nov 01 '24 06:11 BYGX-wcr

The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results: trim trailing whitespace.................................................Passed
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/route/test_route_consistency.py

check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/route/test_route_consistency.py:3:1: F401 'traceback' imported but unused

flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

mssonicbld avatar Nov 01 '24 18:11 mssonicbld

The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results: trim trailing whitespace.................................................Passed
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/route/test_route_consistency.py

check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Passed
flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

mssonicbld avatar Nov 01 '24 18:11 mssonicbld

Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?

What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.

But it didnt factor in the crash recovery time. right?

Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?

What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.

You are doing config_reload before this. the current time.sleep_interval is taking into account for number of routes installation time.. I was thinking if we should ensure first that bgp sessions are established and then wait for sleep_interval?

deepak-singhal0408 avatar Nov 01 '24 22:11 deepak-singhal0408

Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?

What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.

But it didnt factor in the crash recovery time. right?

Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?

What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.

You are doing config_reload before this. the current time.sleep_interval is taking into account for number of routes installation time.. I was thinking if we should ensure first that bgp sessions are established and then wait for sleep_interval?

I added the wait_until for BGP sessions to come up

BYGX-wcr avatar Nov 04 '24 22:11 BYGX-wcr

@StormLiangMS , can you help to review this PR?

BYGX-wcr avatar Nov 05 '24 17:11 BYGX-wcr