sonic-buildimage icon indicating copy to clipboard operation
sonic-buildimage copied to clipboard

To fix the issue: show_techsupport & saidump errors during testbed testing by replacing redis-rdb-tool with rdb-cli

Open JunhongMao opened this issue 1 year ago • 20 comments

Why I did it

Fix issue: https://github.com/sonic-net/sonic-sairedis/issues/1387 The latest redis-rdb-tools-0.1.15 doesn't support Redis 7.0. Redis 7.0 was released in 2020 and adopted by SONiC's latest version. So, this issue turned out. https://github.com/sripathikrishnan/redis-rdb-tools

I.e., the rdb-tools is far behind the Redis 7.0. The librdb can perfectly fix this issue. Please see quote from https://github.com/redis/librdb.

Motivation behind this project There is a genuine need by the Redis community for a versatile RDB file parser that can export data, perform data analysis, or merely extract raw data from RDB and RESTORE it against a live Redis server. However, available parsers have shortcomings in some aspects such as lack of long-term support, lagging far behind the latest Redis release, and usually not being optimized for memory, performance, or high-traffic streaming for production environments. Additionally, most of them are not written in C, which limits the reuse of Redis components and the potential to contribute back to Redis repo. To address these issues, it is worthwhile to develop a new parser with a modern architecture, that maybe can also challenge the current integrated RDB parser of Redis and even replace it in the future.

So, the below PRS are to replace rdbtools with librdb's tool rdb-cli. https://github.com/sonic-net/sonic-buildimage/pull/19268 https://github.com/sonic-net/sonic-sairedis/pull/1391

The above PRs are based on the below PRs. https://github.com/sonic-net/sonic-sairedis/pull/1288 https://github.com/sonic-net/sonic-sairedis/pull/1298 https://github.com/sonic-net/sonic-buildimage/pull/16466 https://github.com/sonic-net/sonic-utilities/pull/2972

How I did it

To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

(1) Updated sonic-buildimage repo's platform/broadcom/docker-syncd-brcm-dnx/Dockerfile.j2, 
 install rdb-cli into the syncd containter.
(2) Updated sonic-sairedis repo's script file: files/scripts/saidump.sh, replace rdbtools with rdb-cli.
(3) Updated sonic-sairedis repo's saidump/saidump.cpp, to process the rdb-cli's ouput json file.

How to verify it

Method 1: On T2 setup with more than 96K routes, execute CLI command -- generate_dump No error should be shown

Method 2: Go into syncd0 container. admin@ixre-egl-board30:~$ docker exec -it syncd0 bash Run below command, get saidump's sha1sum value. root@ixre-egl-board30:/# saidump | sha1sum 1893097ecaeae72383432a54bfe4b285d05b23e5 - Run below command, get saidump.sh's sha1sum value. root@ixre-egl-board30:/# saidump.sh | sha1sum 1893097ecaeae72383432a54bfe4b285d05b23e5 - The two sha1sum values should be the same and no error should be shown. Method 3: To run the testbed show_techsupport test case.

Which release branch to backport (provide reason below if selected)

  • [ ] 201811
  • [ ] 201911
  • [ ] 202006
  • [ ] 202012
  • [ ] 202106
  • [ ] 202111
  • [ ] 202205
  • [ ] 202211
  • [ ] 202305

Tested branch (Please provide the tested image version)

JunhongMao avatar Jun 11 '24 02:06 JunhongMao

@mlok-nokia @kcudnik, please help to review, thanks.

JunhongMao avatar Jun 11 '24 03:06 JunhongMao

@qiluo-msft, could you please review and approve this PR? Thanks.

JunhongMao avatar Jul 15 '24 20:07 JunhongMao

@qiluo-msft, could you please review and approve this PR? Thanks.

JunhongMao avatar Jul 25 '24 13:07 JunhongMao

@qiluo-msft reminder to review/approve.

rlhui avatar Aug 14 '24 18:08 rlhui

Hello, @qiluo-msft reminder to review/approve.

JunhongMao avatar Aug 29 '24 13:08 JunhongMao

Hello, @qiluo-msft. Please help to review and approve. It's been a long time since this PR was created.

JunhongMao avatar Sep 16 '24 13:09 JunhongMao

@qiluo-msft @lguohan. Please help to review and approve. Thanks.

JunhongMao avatar Oct 09 '24 13:10 JunhongMao

@saiarcot895

abdosi avatar Oct 09 '24 18:10 abdosi

MSFT ADO: 29975556

deepak-singhal0408 avatar Oct 22 '24 16:10 deepak-singhal0408

/azpw run Azure.sonic-buildimage

JunhongMao avatar Oct 22 '24 21:10 JunhongMao

/AzurePipelines run Azure.sonic-buildimage

mssonicbld avatar Oct 22 '24 21:10 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Oct 22 '24 21:10 azure-pipelines[bot]

/azpw run Azure.sonic-buildimage

JunhongMao avatar Oct 23 '24 13:10 JunhongMao

/AzurePipelines run Azure.sonic-buildimage

mssonicbld avatar Oct 23 '24 13:10 mssonicbld

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Oct 23 '24 13:10 azure-pipelines[bot]

@deepak-singhal0408 Please help to merge this PR since its related PR has been merged already. Please also mark it is needed for 202405. Thanks

mlok-nokia avatar Oct 28 '24 17:10 mlok-nokia

@arlakshm @abdosi could you help with the merge? Thanks,

deepak-singhal0408 avatar Oct 28 '24 17:10 deepak-singhal0408

@qiluo-msft : can you please merge this

abdosi avatar Oct 30 '24 17:10 abdosi

@lguohan @xumia @qiluo-msft Please help to merge this PR since related PR has been merge already

mlok-nokia avatar Nov 01 '24 16:11 mlok-nokia

Hi @rlhui ,@abdosi , for your viz. This impacts some test cases on 202405 on T2. Thanks

anamehra avatar Nov 04 '24 20:11 anamehra

Cherry-pick PR to 202405: https://github.com/sonic-net/sonic-buildimage/pull/20728

mssonicbld avatar Nov 07 '24 22:11 mssonicbld

@JunhongMao , @qiluo-msft , this PR breaks broadcom build. Log Link: https://dev.azure.com/mssonic/build/_build/results?buildId=692197&view=results

rax.c: In function 'raxRemove':
rax.c:1064:28: error: pointer 'h' may be used after 'free' [-Werror=use-after-free]
 1064 |             raxNode *new = raxRemoveChild(h,child);
      |                            ^~~~~~~~~~~~~~~~~~~~~~~
In file included from rax.c:45:
rax_malloc.h:44:18: note: call to 'free' here
   44 | #define rax_free free
rax.c:1054:13: note: in expansion of macro 'rax_free'
 1054 |             rax_free(child);
      |             ^~~~~~~~
listpack.c: In function 'lpDeleteRangeWithEntry':
listpack.c:937:31: error: pointer 'lp' used after 'realloc' [-Werror=use-after-free]
  937 |     unsigned long poff = first-lp;
      |                          ~~~~~^~~
In file included from listpack.c:45:
In function 'lpShrinkToFit',
    inlined from 'lpDeleteRangeWithEntry' at listpack.c:945:10:
listpack_malloc.h:47:28: note: call to 'realloc' here
   47 | #define lp_realloc(ptr,sz) realloc(ptr,sz)
      |                            ^~~~~~~~~~~~~~~
listpack.c:177:16: note: in expansion of macro 'lp_realloc'
  177 |         return lp_realloc(lp, size);
      |                ^~~~~~~~~~
cc -MM -fPIC -O3 -std=c99 -Wall -Wextra -pedantic -Werror -fvisibility=hidden zipmap.c > zipmap.d
cc1: all warnings being treated as errors
make[4]: *** [Makefile:15: listpack.o] Error 1

liushilongbuaa avatar Nov 11 '24 09:11 liushilongbuaa

I have created a new PR below to fix it. Please help to review it. Thanks. https://github.com/sonic-net/sonic-buildimage/pull/20759

JunhongMao avatar Nov 11 '24 17:11 JunhongMao