sonic-swss
sonic-swss copied to clipboard
[mlagsyncd]Fix mlag socket read fail
What I did fix mclagysncd server socket read fail Why I did it if socket read fail, mclagsyncd can't communicate with iccpd, the whole mlag can't work How I verified it check the socket state Details if related
issue description: from below strace log, the read size for fd 34 is 0, which means the initiazation of this variable is not correct. MSG_BATCH_SIZE should be a macro, otherwise it may be inited after size and cause size 0 .
root@sonic:/home/admin# strace -f -p 6681 (only show 13(server fd)and 34(conn fd))
[pid 6681] setsockopt(13, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 6681] setsockopt(13, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid 6681] bind(13, {sa_family=AF_INET, sin_port=htons(2626), sin_addr=inet_addr("127.0.0.6")}, 16) = 0
[pid 6681] listen(13, 2)
[pid 6681] accept(13, {sa_family=AF_INET, sin_port=htons(40028), sin_addr=inet_addr("127.0.0.1")}, [32624->16]) = 34
[pid 6681] write(1, "Connected!\n", 11) = 11
[pid 6681] epoll_ctl(12, EPOLL_CTL_ADD, 34, {EPOLLIN, {u32=34, u64=34}}) = 0
[pid 6681] epoll_wait(12, [], 2, 0) = 0
[pid 6681] epoll_wait(12,
[{EPOLLIN, {u32=34, u64=34}}], 2, -1) = 1
[pid 6681] read(34, "", 0) = 0 <<====here size is empty, which is confirmed in log, both m_bufsize and m_pos is 0.
[pid 6681] sendto(9, "<13>Jul 21 06:50:44 mclagsyncd: "..., 65, MSG_NOSIGNAL, NULL, 0) = 65
[pid 6681] futex(0x7f707811d1a0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 6681] close(13) = 0
root@sonic:/home/admin# netstat -ap | grep 2626
tcp 0 0 127.0.0.6:2626 0.0.0.0:* LISTEN 4817/mclagsyncd
tcp 946 0 127.0.0.6:2626 localhost:33166 ESTABLISHED 6681/mclagsyncd //receive queue is not empty since read failed
tcp 0 0 localhost:33166 127.0.0.6:2626 ESTABLISHED 6682/iccpd
below show a normal read strace log:
[pid 15027] read(34, "\1\2\24\0\3\0\f\0test_sync_fd", 1048576) = 20
@lguohan @qiluo-msft @Praveen-Brcm help review this, is a sometime issue, thanks!
@lguohan @qiluo-msft @Praveen-Brcm help review this, is a sometime issue, thanks!
@pettershao-ragilenetworks Thanks for the change. Am not added as reviwer, if someone can add me will provide approals from my end. Thanks- Praveen.
@lguohan @qiluo-msft help forward this, thanks!
@lguohan help merge this, thanks!
@lguohan @qiluo-msft help merge this, thanks!
@lguohan @qiluo-msft help forward this, thanks!
/azp run
Azure Pipelines successfully started running 1 pipeline(s).
@lguohan @qiluo-msft help forward this, thanks!
/azp run
Azure Pipelines successfully started running 1 pipeline(s).
@lguohan @qiluo-msft As it is approved long time ago, help merge this, thanks!
@pettershao-ragilenetworks The issue you tried to fix here is already fixed in https://github.com/Azure/sonic-swss/pull/2112 by fixing the initialization order.