sonic-mgmt
sonic-mgmt copied to clipboard
Add new test cases to verify route consistency before and after critical processes crash
Description of PR
Summary: Fixes # (issue) https://github.com/sonic-net/sonic-mgmt/issues/14983
Type of change
- [ ] Bug fix
- [ ] Testbed and Framework(new/improvement)
- [ X] Test case(new/improvement)
Back port request
- [ ] 202012
- [ ] 202205
- [ ] 202305
- [ ] 202311
- [X ] 202405
Approach
What is the motivation for this PR?
To address the test gap raised by the issue.
How did you do it?
I added 3 cases each of which tests the route consistency of the DUT before and after a type of critical processes crash. Now the test cover the following critical processes:
- bgpd in bgp container
- syncd in syncd container
- orchagent in swss container
How did you verify/test it?
I verified it on t0, t1 and t2 testbeds in MSFT internal lab devices.
Any platform specific information?
Supported testbed topology if it's a new test case?
Any
Documentation
The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.
Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook
Fixing tests/route/test_route_consistency.py
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
tests/route/test_route_consistency.py:17:1: E302 expected 2 blank lines, found 1
tests/route/test_route_consistency.py:208:23: E711 comparison to None should be 'if cond is None:'
tests/route/test_route_consistency.py:226:23: E711 comparison to None should be 'if cond is None:'
tests/route/test_route_consistency.py:260:23: E711 comparison to None should be 'if cond is None:'
tests/route/test_route_consistency.py:307:23: E711 comparison to None should be 'if cond is None:'
...
[truncated extra lines, please run pre-commit locally to view full check results]
To run the pre-commit checks locally, you can follow below steps:
- Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
- Ensure that the
pre-commitpackage is installed:
sudo pip install pre-commit
- Go to repository root folder
- Install the pre-commit hooks:
pre-commit install
- Use pre-commit to check staged file:
pre-commit
- Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>
The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.
Detailed pre-commit check results:
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook
Fixing tests/route/test_route_consistency.py
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Passed
flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped
To run the pre-commit checks locally, you can follow below steps:
- Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
- Ensure that the
pre-commitpackage is installed:
sudo pip install pre-commit
- Go to repository root folder
- Install the pre-commit hooks:
pre-commit install
- Use pre-commit to check staged file:
pre-commit
- Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>
@deepak-singhal0408 , can you please spend some time reviewing this PR?
Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?
Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?
What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.
The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.
Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook
Fixing tests/route/test_route_consistency.py
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
tests/route/test_route_consistency.py:3:1: F401 'traceback' imported but unused
flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped
To run the pre-commit checks locally, you can follow below steps:
- Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
- Ensure that the
pre-commitpackage is installed:
sudo pip install pre-commit
- Go to repository root folder
- Install the pre-commit hooks:
pre-commit install
- Use pre-commit to check staged file:
pre-commit
- Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>
The pre-commit check detected issues in the files touched by this pull request. The pre-commit check is a mandatory check, please fix detected issues.
Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook
Fixing tests/route/test_route_consistency.py
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Passed
flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped
To run the pre-commit checks locally, you can follow below steps:
- Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt docker container.
- Ensure that the
pre-commitpackage is installed:
sudo pip install pre-commit
- Go to repository root folder
- Install the pre-commit hooks:
pre-commit install
- Use pre-commit to check staged file:
pre-commit
- Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>
Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?
What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.
But it didnt factor in the crash recovery time. right?
Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?
What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.
You are doing config_reload before this. the current time.sleep_interval is taking into account for number of routes installation time.. I was thinking if we should ensure first that bgp sessions are established and then wait for sleep_interval?
Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?
What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.
But it didnt factor in the crash recovery time. right?
Thanks @BYGX-wcr . Overall flow looks fine. Have some comments, please check and respond/Address them? Also for the recovery part, I see that you are waiting for sleep_interval.. have you tried this on scaled setup to see if the interval is enough?
What do you mean by scaled setup? A testbed with a lot of BGP routes? I tried on T0/T1/T2 topologies and encountered no problem. The maximum number of withdrawn routes was at X * 1e4 scale. The sleep interval is calculated with consideration of the number of routes, so theoretically it should scale with the number of BGP routes.
You are doing config_reload before this. the current time.sleep_interval is taking into account for number of routes installation time.. I was thinking if we should ensure first that bgp sessions are established and then wait for sleep_interval?
I added the wait_until for BGP sessions to come up
@StormLiangMS , can you help to review this PR?