sonic-mgmt icon indicating copy to clipboard operation
sonic-mgmt copied to clipboard

Add debuggability for reboot function

Open JibinBao opened this issue 1 year ago • 21 comments
trafficstars

Description of PR

  1. Add function to collect console log from starting reboot to dut up
  2. When dut is not up, check if dut is pingable and collect the mgmt interface config

Summary: Fixes # (issue)

Type of change

  • [x] Bug fix
  • [ ] Testbed and Framework(new/improvement)
  • [ ] Test case(new/improvement)

Back port request

  • [ ] 202012
  • [ ] 202205
  • [ ] 202305
  • [x] 202311
  • [x] 202405

Approach

What is the motivation for this PR?

Enhance the debugability for reboot fucntion

How did you do it?

  1. Add function to collect console log from starting reboot to dut up
  2. When dut is not up, check if dut is pingable and collect the mgmt interface config

How did you verify/test it?

Run reboot cases

Any platform specific information?

Any

Supported testbed topology if it's a new test case?

Documentation

JibinBao avatar Aug 30 '24 07:08 JibinBao

/azpw run Azure.sonic-mgmt

JibinBao avatar Sep 10 '24 01:09 JibinBao

/AzurePipelines run Azure.sonic-mgmt

mssonicbld avatar Sep 10 '24 01:09 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Sep 10 '24 01:09 azure-pipelines[bot]

/azpw run Azure.sonic-mgmt

JibinBao avatar Sep 10 '24 05:09 JibinBao

/AzurePipelines run Azure.sonic-mgmt

mssonicbld avatar Sep 10 '24 05:09 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Sep 10 '24 05:09 azure-pipelines[bot]

/azpw run Azure.sonic-mgmt

JibinBao avatar Sep 11 '24 01:09 JibinBao

/AzurePipelines run Azure.sonic-mgmt

mssonicbld avatar Sep 11 '24 01:09 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Sep 11 '24 01:09 azure-pipelines[bot]

/azpw run Azure.sonic-mgmt

JibinBao avatar Sep 18 '24 02:09 JibinBao

/AzurePipelines run Azure.sonic-mgmt

mssonicbld avatar Sep 18 '24 02:09 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Sep 18 '24 02:09 azure-pipelines[bot]

The concern for this change is we may see test failure if the console connection is not right or not working fine, even though it's not related to the reboot feature. @prgeor Can you please help review as well?

bingwang-ms avatar Sep 23 '24 17:09 bingwang-ms

When console connection is not right, it will not affect the reboot function, it just does not collect the console logs

JibinBao avatar Sep 24 '24 01:09 JibinBao

Hi @JibinBao, I still have concern on this change. It's because there are various console types and some of them are not stable enough. We don't want to fail the reboot test for a console connection issue.

bingwang-ms avatar Oct 14 '24 18:10 bingwang-ms

Hi @JibinBao, I still have concern on this change. It's because there are various console types and some of them are not stable enough. We don't want to fail the reboot test for a console connection issue.

Hi @bingwang-ms , You are right, I have the same conncern. So, we use try to create the dutconsole. If fail to create dutconsole, we will not collect the logs. so it will not affect the original test logic. You can check the funciton of try_create_dut_console

JibinBao avatar Oct 15 '24 01:10 JibinBao

@JibinBao please resolve the conflicts

prgeor avatar Oct 15 '24 06:10 prgeor

@JibinBao please resolve the conflicts

Will handle it. Thanks

JibinBao avatar Oct 15 '24 06:10 JibinBao

prgeor

hi @prgeor , the Conflict has been fixed. Can you help review it again?

JibinBao avatar Oct 16 '24 05:10 JibinBao

Hi @bingwang-ms , I push a fix for resloving the conflict, but it is stuck on the status(please see picture below). Can you help check it? and What I need to do next? image

JibinBao avatar Oct 31 '24 01:10 JibinBao

Hi @JibinBao, I don't agree to merge this change. Unstable console connection can lead to test_reboot failure. Can you keep the change in your internal repo?

Hi @bingwang-ms , Because we it will not be affect by the console issue.

  1. If it fails to create a console, it will not collect console logs, and execute the code as the original logic
  2. We have run the test with the fixed for more than tow months, there is not a failure caused by it
  3. If we keep it internal repo, it will alway confict when updating code from communtiy
  4. Can you merge it? if it is not stable, we can revert it.

Thanks

JibinBao avatar Nov 01 '24 01:11 JibinBao

and lets please resolve conflict so we can approve and merge

liat-grozovik avatar Dec 03 '24 18:12 liat-grozovik

cannot reopen

JibinBao avatar Dec 04 '24 03:12 JibinBao

reopen it

JibinBao avatar Dec 04 '24 03:12 JibinBao

When pushing the new fix it is hung in "processing-updates" status. So close it and open new PR: https://github.com/sonic-net/sonic-mgmt/pull/15868

JibinBao avatar Dec 04 '24 03:12 JibinBao