ltp icon indicating copy to clipboard operation
ltp copied to clipboard

tirpc_rpc_broadcast_exp 10.0.0.2 536875000 1 2 failed unexpectedly

Open HWCODEING opened this issue 8 months ago • 7 comments

Hi uname -a Linux eLxr 6.1.0-29-amd64 #1 SMP PREEMPT_DYNAMIC eLxr 6.1.123-elxr1-1 (2025-01-20) x86_64 GNU/Linux

LTP Version : 20240129

got a failed test case when i executed './runltp -f net.tirpc_tests -s tirpc_rpc_broadcast' (contians all of about rpc broadcast test that is all failed.)

I already take a lot of time on this issue.

Command: ./runltp -f net.tirpc_tests -s tirpc_rpc_broadcast

error log: rpc_test 1 TINFO: using libtirpc: yes rpc_test 1 TFAIL: tirpc_rpc_broadcast_exp 10.0.0.2 536875000 1 2 failed unexpectedly

if you have any methods for this please tell me

Full Logs: ` <<execution_status>>> initiation_status="ok" duration=1 termination_type=exited termination_id=0 corefile=no cutime=75 cstime=53 <<<test_end>>> <<<test_start>>> tag=tirpc_rpc_broadcast stime=1742289611 cmdline="rpc_test.sh -s tirpc_svc_1 -c tirpc_rpc_broadcast" contacts="" analysis=exit <<<test_output>>> rpc_test 1 TINFO: Running: rpc_test.sh -s tirpc_svc_1 -c tirpc_rpc_broadcast rpc_test 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface rpc_test 1 TINFO: add local addr 10.0.0.2/24 rpc_test 1 TINFO: add local addr fd00:1:1:1::2/64 rpc_test 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface rpc_test 1 TINFO: add remote addr 10.0.0.1/24 rpc_test 1 TINFO: add remote addr fd00:1:1:1::1/64 rpc_test 1 TINFO: Network config (local -- remote): rpc_test 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1 rpc_test 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24 rpc_test 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64 rpc_test 1 TINFO: timeout per run is 0h 5m 0s rpc_test 1 TINFO: check registered RPC with rpcinfo rpc_test 1 TINFO: registered RPC: program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 10 1 udp 59372 rpc_test 1 TINFO: using libtirpc: yes rpc_test 1 TFAIL: tirpc_rpc_broadcast 10.0.0.2 536875000 failed unexpectedly 1 rpc_test 2 TINFO: AppArmor enabled, this may affect test results rpc_test 2 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root) rpc_test 2 TINFO: loaded AppArmor profiles: none

Summary: passed 0 failed 1 broken 0 skipped 0 warnings 0 <<<execution_status>>> initiation_status="ok" duration=14 termination_type=exited termination_id=1 corefile=no cutime=76 cstime=64 <<<test_end>>> <<<test_start>>> tag=tirpc_rpc_broadcast_exp stime=1742289625 cmdline="rpc_test.sh -s tirpc_svc_1 -c tirpc_rpc_broadcast_exp -e "1,2"" contacts="" analysis=exit <<<test_output>>> rpc_test 1 TINFO: Running: rpc_test.sh -s tirpc_svc_1 -c tirpc_rpc_broadcast_exp -e 1,2 rpc_test 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface rpc_test 1 TINFO: add local addr 10.0.0.2/24 rpc_test 1 TINFO: add local addr fd00:1:1:1::2/64 rpc_test 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface rpc_test 1 TINFO: add remote addr 10.0.0.1/24 rpc_test 1 TINFO: add remote addr fd00:1:1:1::1/64 rpc_test 1 TINFO: Network config (local -- remote): rpc_test 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1 rpc_test 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24 rpc_test 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64 rpc_test 1 TINFO: timeout per run is 0h 5m 0s rpc_test 1 TINFO: check registered RPC with rpcinfo rpc_test 1 TINFO: registered RPC: program vers proto port service 100000 4 tcp 111 portmapper 100000 3 tcp 111 portmapper 100000 2 tcp 111 portmapper 100000 4 udp 111 portmapper 100000 3 udp 111 portmapper 100000 2 udp 111 portmapper 10 1 udp 59372 rpc_test 1 TINFO: using libtirpc: yes rpc_test 1 TFAIL: tirpc_rpc_broadcast_exp 10.0.0.2 536875000 1 2 failed unexpectedly 1 rpc_test 2 TINFO: AppArmor enabled, this may affect test results rpc_test 2 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root) rpc_test 2 TINFO: loaded AppArmor profiles: none

Summary: passed 0 failed 1 broken 0 skipped 0 warnings 0`

HWCODEING avatar Mar 24 '25 07:03 HWCODEING

Update: The server RCP service program "tirpc_svc_1" looks can't to listen UDP port 111, even though the 111 port has received some data when I used "tcpdump -i enp0s3 udp port 111 -n ". i consider phenomenon above may be Linux environment (such as firewall i guess )

HWCODEING avatar Mar 31 '25 03:03 HWCODEING

Update :

The ltp_ns_veth1 and ltp_ns_veth2 broadcast is 0.0.0.0 that seems not normal. ltp_ns_veth2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.0.2 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fd00:1:1:1::2 prefixlen 64 scopeid 0x0 inet6 fe80::80d1:d2ff:fead:f770 prefixlen 64 scopeid 0x20 ether 82:d1:d2:ad:f7:70 txqueuelen 1000 (Ethernet) RX packets 71 bytes 6894 (6.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 41 bytes 3066 (2.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

HWCODEING avatar Apr 14 '25 05:04 HWCODEING

update: After Ubuntu 20.04 (contain 20.04), localhost can't get any response when running the rpcinfo -b 100000 4 (whatever Pronumber). i guess the kernel didn't allow these actions (broadcast)

HWCODEING avatar Apr 18 '25 06:04 HWCODEING

FWIW I could reproduce on Debian Buster (linux-4.19.316 + rpcbind-1.2.5 + libtirpc-1.1.4):

root@debian:~/rpcbind# LTPROOT=/opt/ltp PATH="/opt/ltp/testcases/bin:$PATH" rpc_test.sh -s tirpc_svc_1 -c tirpc_rpc_broadcast
rpc_test 1 TINFO: Running: rpc_test.sh -s tirpc_svc_1 -c tirpc_rpc_broadcast
rpc_test 1 TINFO: Tested kernel: Linux debian 4.19.0-27-amd64 #1 SMP Debian 4.19.316-1 (2024-06-25) x86_64 GNU/Linux
rpc_test 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface
rpc_test 1 TINFO: add local addr 10.0.0.2/24
rpc_test 1 TINFO: add local addr fd00:1:1:1::2/64
rpc_test 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface
rpc_test 1 TINFO: add remote addr 10.0.0.1/24
rpc_test 1 TINFO: add remote addr fd00:1:1:1::1/64
rpc_test 1 TINFO: Network config (local -- remote):
rpc_test 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1
rpc_test 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24
rpc_test 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64
rpc_test 1 TINFO: timeout per run is 0h 5m 0s
rpc_test 1 TINFO: check registered RPC with rpcinfo
rpcbind: PMAPPROC_DUMP

rpc_test 1 TINFO: registered RPC:
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
rpc_test 1 TINFO: using libtirpc: yes
rpcbind: RPCB_UNSET request for (536875000, 1, ) : 
rpcbind: RPCB_UNSET: succeeded
rpcbind: RPCB_GETADDR req for (100000, 2, tcp) from 127.0.0.1.2.106: 
mergeaddr: contact uaddr = 127.0.0.1.0.111
addrmerge(caller, 0.0.0.0.0.111, 127.0.0.1.0.111, tcp
addrmerge: hint 127.0.0.1.0.111
addrmerge: returning 127.0.0.1.0.111
mergeaddr: uaddr = 0.0.0.0.0.111, merged uaddr = 127.0.0.1.0.111
rpcbind: getaddr: 127.0.0.1.0.111
rpcbind: PMAPPROC_DUMP

rpcbind: RPCB_UNSET request for (536875000, 1, udp) : 
rpcbind: RPCB_UNSET: succeeded
rpcbind: RPCB_SET request for (536875000, 1, udp, 0.0.0.0.213.54) : 
rpcbind: RPCB_SET: succeeded
rpcbind: RPCB_UNSET request for (536875000, 1, tcp) : 
rpcbind: RPCB_UNSET: succeeded
rpcbind: RPCB_SET request for (536875000, 1, tcp, 0.0.0.0.242.31) : 
rpcbind: RPCB_SET: succeeded
rpcbind: RPCB_UNSET request for (536875000, 1, udp6) : 
rpcbind: RPCB_UNSET: succeeded
rpcbind: RPCB_SET request for (536875000, 1, udp6, ::.225.250) : 
rpcbind: RPCB_SET: succeeded
rpcbind: RPCB_UNSET request for (536875000, 1, tcp6) : 
rpcbind: RPCB_UNSET: succeeded
rpcbind: RPCB_SET request for (536875000, 1, tcp6, ::.225.76) : 
rpcbind: RPCB_SET: succeeded
rpcbind: RPCB_GETADDR req for (100000, 2, tcp) from 127.0.0.1.2.109: 
mergeaddr: contact uaddr = 127.0.0.1.0.111
addrmerge(caller, 0.0.0.0.0.111, 127.0.0.1.0.111, tcp
addrmerge: hint 127.0.0.1.0.111
addrmerge: returning 127.0.0.1.0.111
mergeaddr: uaddr = 0.0.0.0.0.111, merged uaddr = 127.0.0.1.0.111
rpcbind: getaddr: 127.0.0.1.0.111
rpcbind: PMAPPROC_DUMP

rpc_test 1 TFAIL: tirpc_rpc_broadcast 10.0.0.2 536875000 failed unexpectedly
1
rpcbind: RPCB_UNSET request for (536875000, 1, ) : 
rpcbind: Suppression RPC_UNSET(map_unset) 
rpcbind: rbl->rpcb_map.r_owner=superuser 
rpcbind: owner=superuser 
rpcbind: Suppression RPC_UNSET(map_unset) 
rpcbind: rbl->rpcb_map.r_owner=superuser 
rpcbind: owner=superuser 
rpcbind: Suppression RPC_UNSET(map_unset) 
rpcbind: rbl->rpcb_map.r_owner=superuser 
rpcbind: owner=superuser 
rpcbind: Suppression RPC_UNSET(map_unset) 
rpcbind: rbl->rpcb_map.r_owner=superuser 
rpcbind: owner=superuser 
rpcbind: RPCB_UNSET: succeeded
rpc_test 2 TINFO: AppArmor enabled, this may affect test results
rpc_test 2 TINFO: it can be disabled with TST_DISABLE_APPARMOR=1 (requires super/root)
rpc_test 2 TINFO: loaded AppArmor profiles: none

Summary:
passed   0
failed   1
broken   0
skipped  0
warnings 0

I was investigating rpc_pmap_rmtcall earlier today and realized remote calls are not enabled by default anymore, but it's not the case here (rpcbind -r did not help). Next step would be to test an even older kernel (e.g. Debian Stretch), as we need to have a TPASS to properly bisect the issue.

rbmarliere avatar May 29 '25 20:05 rbmarliere

I tested Stretch and Jessie (below). I was unable to get a TPASS, I guess this old test was only relevant and working in Linux 2.6 days...

root@debian:~/ltp# uname -a
Linux debian 3.16.0-6-amd64 #1 SMP Debian 3.16.56-1+deb8u1 (2018-05-08) x86_64 GNU/Linux

root@debian:~/ltp# apt show rpcbind
Version: 0.2.1-6+deb8u2

root@debian:~/ltp# apt show libtirpc1
Version: 0.2.5-1+deb8u1

root@debian:~/ltp# LTPROOT=/opt/ltp PATH="/opt/ltp/testcases/bin:$PATH" rpc_test.sh -s tirpc_svc_1 -c tirpc_rpc_broadcast
rpc_test 1 TINFO: initialize 'lhost' 'ltp_ns_veth2' interface
rpc_test 1 TINFO: add local addr 10.0.0.2/24
rpc_test 1 TINFO: add local addr fd00:1:1:1::2/64
rpc_test 1 TINFO: initialize 'rhost' 'ltp_ns_veth1' interface
rpc_test 1 TINFO: add remote addr 10.0.0.1/24
rpc_test 1 TINFO: add remote addr fd00:1:1:1::1/64
rpc_test 1 TINFO: Network config (local -- remote):
rpc_test 1 TINFO: ltp_ns_veth2 -- ltp_ns_veth1
rpc_test 1 TINFO: 10.0.0.2/24 -- 10.0.0.1/24
rpc_test 1 TINFO: fd00:1:1:1::2/64 -- fd00:1:1:1::1/64
rpc_test 1 TINFO: timeout per run is 0h 5m 0s
rpc_test 1 TINFO: check registered RPC with rpcinfo
rpc_test 1 TINFO: registered RPC:
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
rpc_test 1 TINFO: using libtirpc: yes
rpc_test 1 TFAIL: tirpc_rpc_broadcast 10.0.0.2 536875000 failed unexpectedly
1

Summary:
passed   0
failed   1
broken   0
skipped  0
warnings 0
root@debian:~/ltp# git show   # current version was not building, thus used an older commit as HEAD...
commit 360927080bfa0fe2a52761bf64fc5c86874795ff
Author: Petr Vorel <[email protected]>
Date:   Wed Oct 25 16:54:14 2023 +0200

    doc/support: Update the oldest tested distros

rbmarliere avatar Jun 03 '25 16:06 rbmarliere

After testing Squeeze with kernel 2.6.32 and hitting the same issue (albeit an older LTP version) I realized it was probably a regression in LTP. I went back to Jessie to bisect and found the offending commit: https://github.com/linux-test-project/ltp/commit/fcd4397233029e563b2a398daf9b98a8293a502e.

root@debian:~/ltp# uname -a
Linux debian 3.16.0-6-amd64 #1 SMP Debian 3.16.56-1+deb8u1 (2018-05-08) x86_64 GNU/Linux

root@debian:~/ltp# apt show rpcbind
Version: 0.2.1-6+deb8u2

root@debian:~/ltp# apt show libtirpc1
Version: 0.2.5-1+deb8u1

root@debian:~/ltp# git checkout fcd439723302^ && git clean -fdx && make autotools && ./configure && make -j8
 && rm -rf /opt/ltp && make install && LTPROOT=/opt/ltp PATH="/opt/ltp/testcases/bin:$PATH" rpc_test.sh -s t
irpc_svc_1 -c tirpc_rpc_broadcast                                                                           
(...)                                                                                                       
0                                                                                                           
tirpc_rpc_broadcast 1 \e[1;32mTPASS:\e[00m Test passed                                                      
Killed                                                                                                      
                                                                                                            
root@debian:~/ltp# git checkout fcd439723302 && git clean -fdx && make autotools && ./configure && make -j8 
&& rm -rf /opt/ltp && make install && LTPROOT=/opt/ltp PATH="/opt/ltp/testcases/bin:$PATH" rpc_test.sh -s ti
rpc_svc_1 -c tirpc_rpc_broadcast                                                                            
(...)                                                                                                       
tirpc_rpc_broadcast 1 \e[1;34mTINFO:\e[00m initialize 'lhost' 'ltp_ns_veth2' interface                      
tirpc_rpc_broadcast 1 \e[1;34mTINFO:\e[00m set local addr 10.0.0.2/24                                       
tirpc_rpc_broadcast 1 \e[1;34mTINFO:\e[00m set local addr fd00:1:1:1::2/64                                  
tirpc_rpc_broadcast 1 \e[1;34mTINFO:\e[00m initialize 'rhost' 'ltp_ns_veth1' interface                      
tirpc_rpc_broadcast 1 \e[1;34mTINFO:\e[00m set remote addr 10.0.0.1/24                                      
tirpc_rpc_broadcast 1 \e[1;34mTINFO:\e[00m set remote addr fd00:1:1:1::1/64                                 
tirpc_rpc_broadcast 1 \e[1;31mTBROK:\e[00m 'tirpc_rpc_broadcast 10.0.0.2 536875000 ' failed on '': '1'      

rbmarliere avatar Jun 04 '25 15:06 rbmarliere

@rbmarliere Thanks for a debugging. Could you please debug which commands are actually called prior fcd4397233029e563b2a398daf9b98a8293a502e change?

I would guess it would be something like (port obviously different):

tirpc_rpc_broadcast localhost 536875000

But now it's using netns IP:

tirpc_rpc_broadcast 10.0.0.2 536875000

Doing some more testing based on https://lore.kernel.org/linux-nfs/20250602133741.GA324895@pevik/ test is working when:

  1. Using rpcbind with --enable-rmtcalls (remote calls are probably mandatory as for other tests).
  2. Connecting directly on localhost work instead of using network namespaces (the default for tst_net.sh unless you configure 2 host side ssh based configuration https://github.com/linux-test-project/ltp/tree/master/testcases/network#two-host-configuration):
+++ testcases/network/rpc/rpc-tirpc/rpc_test.sh
@@ -87,7 +87,7 @@ do_test()
                done
        fi
 
-       EXPECT_RHOST_PASS $CLIENT $(tst_ipaddr) $PROGNUMNOSVC $CLIENT_EXTRA_OPTS
+       EXPECT_PASS $CLIENT $(tst_ipaddr) $PROGNUMNOSVC $CLIENT_EXTRA_OPTS
 }
 
 . rpc_lib.sh

The question is whether the problem is with network namespaces or it would break testing via remote host as well.

Obviously testing the latest libtirpc (>= 1.3.5), rpcbind (>= 1.2.7) and kernel (>= v6.5) with AF_UNIX abstract address enabled in rpcbind.socket does not help (as it failed on Tumbleweed where I did the testing with rpcbind compiled with --enable-rmtcalls): https://git.linux-nfs.org/?p=steved/rpcbind.git;a=commit;h=652aa9a28d6c4403fb93133079f8a7e07cbe9fb8 https://git.linux-nfs.org/?p=steved/libtirpc.git;a=commit;h=d68523a88ba0a60b949c3bbb2c246be3cfeb0eea https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.15&id=626590ea4c93814808a8c4e5ffd2aa0d27f05d4b https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.15&id=4388ce05fa38b17e7d9ddabffcb16ed778ee417c

/usr/lib/systemd/system/rpcbind.socket

...
[Socket]
ListenStream=/run/rpcbind.sock
ListenStream=@/run/rpcbind.sock

# RPC netconfig can't handle ipv6/ipv4 dual sockets
BindIPv6Only=ipv6-only
ListenStream=0.0.0.0:111
ListenDatagram=0.0.0.0:111
ListenStream=[::]:111
ListenDatagram=[::]:111

pevik avatar Jun 04 '25 16:06 pevik