xdp-tutorial icon indicating copy to clipboard operation
xdp-tutorial copied to clipboard

packet03 seems to be causing huge packet loss

Open natesales opened this issue 3 years ago • 65 comments

I'm trying xdp_router under https://github.com/xdp-project/xdp-tutorial/blob/master/packet-solutions/xdp_prog_kern_03.c, but after loading the program onto 2 NICs, I'm no longer able to ping through the router with a reasonable amount of packet loss. (One ICMP response every few minutes)

I'm using a 2 port Agilio NIC and have ens5np0 facing the internet and ens5np1 to another linux box.

Distro: Ubuntu 20.04.1 LTS Kernel: 5.4.0-53-generic

mount -t bpf bpf /sys/fs/bpf/
./xdp_loader -d ens5np0 --progsec xdp_router -F
./xdp_loader -d ens5np1 --progsec xdp_router -F
./xdp_prog_user -d ens5np0
./xdp_prog_user -d ens5np1
# ./xdp_stats -d ens5np0
Collecting stats from BPF map
 - BPF map (bpf_map_type:6) id:10 name:xdp_stats_map key_size:4 value_size:16 max_entries:5
XDP-action  
XDP_ABORTED          250 pkts (         0 pps)          39 Kbytes (     0 Mbits/s) period:0.250332
XDP_DROP               0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250286
XDP_PASS            9083 pkts (        56 pps)         570 Kbytes (     0 Mbits/s) period:0.250286
XDP_TX                 0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250287
XDP_REDIRECT       18674 pkts (         8 pps)        1389 Kbytes (     0 Mbits/s) period:0.250287

# ./xdp_stats -d ens5np1
Collecting stats from BPF map
 - BPF map (bpf_map_type:6) id:14 name:xdp_stats_map key_size:4 value_size:16 max_entries:5
XDP-action  
XDP_ABORTED            1 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250282
XDP_DROP               0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250285
XDP_PASS            5632 pkts (        52 pps)         345 Kbytes (     0 Mbits/s) period:0.250287
XDP_TX                 0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250288
XDP_REDIRECT          10 pkts (         0 pps)           1 Kbytes (     0 Mbits/s) period:0.250288

Here's a ping to the linux box through the router. The BPF program was loaded onto the NIC around sequence 18, after which the ICMP replies stop coming through for the most part. There are a few that get through though, every few minutes (as you can see by the sequence numbers).

<truncated>
64 bytes from 192.0.2.130: icmp_seq=16 ttl=56 time=14.2 ms
64 bytes from 192.0.2.130: icmp_seq=17 ttl=56 time=21.9 ms
64 bytes from 192.0.2.130: icmp_seq=18 ttl=56 time=14.6 ms
64 bytes from 192.0.2.130: icmp_seq=76 ttl=56 time=15.3 ms
64 bytes from 192.0.2.130: icmp_seq=183 ttl=56 time=14.3 ms
^C

Any guidance would be greatly appreciated!

natesales avatar Nov 13 '20 03:11 natesales

Well, I suppose this corresponds to the XDP_ABORTED returns that you're seeing in the stats. Those are most likely coming from XDP_REDIRECT returning XDP_ABORTED, which should only happen if the map index doesn't exist.

You could try dumping the map contents (with bpftool) and to check that that is being populated correctly. You could also try adding a bpf_printk() with the ifindex before the call to bpf_redirect_map() to see what ifindex it is actually redirecting to...

tohojo avatar Nov 13 '20 11:11 tohojo

I have xdp/id:32 on ens5np0, which I've dumped with the following commands. I'm not quite sure what I'm looking for here, although map 11 seems to be problematic.

# bpftool prog show id 32
32: xdp  name xdp_router_func  tag c09aa7842e861c05  gpl
	loaded_at 2020-11-13T03:48:56+0000  uid 0
	xlated 1480B  jited 820B  memlock 4096B  map_ids 10,11
# bpftool map show id 10
10: percpu_array  name xdp_stats_map  flags 0x0
	key 4B  value 16B  max_entries 5  memlock 4096B
# bpftool map show id 11
11: devmap  name tx_port  flags 0x80
	key 4B  value 4B  max_entries 256  memlock 4096B

# bpftool map dump id 10
/home/eo-admin # bpftool map dump id 10
key:
00 00 00 00
value (CPU 00): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 01): 66 00 00 00 00 00 00 00  e5 43 00 00 00 00 00 00
value (CPU 02): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 03): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 04): 02 00 00 00 00 00 00 00  7a 00 00 00 00 00 00 00
value (CPU 05): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 06): 08 00 00 00 00 00 00 00  02 02 00 00 00 00 00 00
value (CPU 07): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 08): 06 00 00 00 00 00 00 00  92 01 00 00 00 00 00 00
value (CPU 09): 71 00 00 00 00 00 00 00  3f 4d 00 00 00 00 00 00
value (CPU 10): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 11): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 12): 04 00 00 00 00 00 00 00  f0 00 00 00 00 00 00 00
value (CPU 13): 0f 00 00 00 00 00 00 00  c8 04 00 00 00 00 00 00
value (CPU 14): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 15): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
key:
01 00 00 00
value (CPU 00): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 01): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 02): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 03): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 04): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 05): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 06): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 07): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 08): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 09): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 10): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 11): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 12): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 13): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 14): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 15): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
key:
02 00 00 00
value (CPU 00): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 01): 47 c9 00 00 00 00 00 00  c3 b9 df 00 00 00 00 00
value (CPU 02): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 03): ed 1b 01 00 00 00 00 00  cb ba d1 00 00 00 00 00
value (CPU 04): d5 40 37 00 00 00 00 00  e3 6c 31 0d 00 00 00 00
value (CPU 05): 30 00 00 00 00 00 00 00  ba 13 00 00 00 00 00 00
value (CPU 06): 99 3a 00 00 00 00 00 00  15 cd 29 00 00 00 00 00
value (CPU 07): 15 49 00 00 00 00 00 00  8d 9e 30 00 00 00 00 00
value (CPU 08): 30 3e 01 00 00 00 00 00  9f 45 ef 00 00 00 00 00
value (CPU 09): 68 95 00 00 00 00 00 00  40 c9 5f 00 00 00 00 00
value (CPU 10): ff 0f 00 00 00 00 00 00  57 92 05 00 00 00 00 00
value (CPU 11): c7 49 00 00 00 00 00 00  c6 89 19 00 00 00 00 00
value (CPU 12): 55 92 00 00 00 00 00 00  bd c6 49 00 00 00 00 00
value (CPU 13): dc 0b 00 00 00 00 00 00  e6 a7 04 00 00 00 00 00
value (CPU 14): 58 1d 00 00 00 00 00 00  fe 34 0a 00 00 00 00 00
value (CPU 15): d6 00 00 00 00 00 00 00  e7 50 00 00 00 00 00 00
key:
03 00 00 00
value (CPU 00): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 01): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 02): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 03): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 04): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 05): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 06): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 07): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 08): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 09): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 10): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 11): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 12): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 13): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 14): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 15): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
key:
04 00 00 00
value (CPU 00): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 01): d0 43 0a 00 00 00 00 00  60 53 0f 06 00 00 00 00
value (CPU 02): 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
value (CPU 03): 5f 57 08 00 00 00 00 00  a0 be f9 04 00 00 00 00
value (CPU 04): 5a f7 00 00 00 00 00 00  62 8c 42 00 00 00 00 00
value (CPU 05): 19 00 00 00 00 00 00 00  f7 05 00 00 00 00 00 00
value (CPU 06): e6 2f 00 00 00 00 00 00  bf b4 0c 00 00 00 00 00
value (CPU 07): 4d 90 00 00 00 00 00 00  ec c6 26 00 00 00 00 00
value (CPU 08): 4a f2 00 00 00 00 00 00  21 0e 41 00 00 00 00 00
value (CPU 09): c5 c9 02 00 00 00 00 00  93 30 9e 01 00 00 00 00
value (CPU 10): 9e 16 00 00 00 00 00 00  1c 23 07 00 00 00 00 00
value (CPU 11): d9 5e 00 00 00 00 00 00  a7 fd 19 00 00 00 00 00
value (CPU 12): 07 ff 00 00 00 00 00 00  aa af 45 00 00 00 00 00
value (CPU 13): cc 1b 00 00 00 00 00 00  4a 46 07 00 00 00 00 00
value (CPU 14): 6f 25 00 00 00 00 00 00  bc f3 09 00 00 00 00 00
value (CPU 15): 84 13 00 00 00 00 00 00  a1 87 0c 00 00 00 00 00
Found 5 elements
/home/eo-admin # bpftool map dump id 11
key:
00 00 00 00
value:
No such file or directory
key: 01 00 00 00  value: 01 00 00 00
key: 02 00 00 00  value: 02 00 00 00
key: 03 00 00 00  value: 03 00 00 00
key: 04 00 00 00  value: 04 00 00 00
key: 05 00 00 00  value: 05 00 00 00
key: 06 00 00 00  value: 06 00 00 00
key: 07 00 00 00  value: 07 00 00 00
key: 08 00 00 00  value: 08 00 00 00
key: 09 00 00 00  value: 09 00 00 00
key: 0a 00 00 00  value: 0a 00 00 00
key: 0b 00 00 00  value: 0b 00 00 00
<the following 4 lines are repeated from 0c to ff>
key:
0c 00 00 00
value:
No such file or directory

Found 11 elements

natesales avatar Nov 13 '20 16:11 natesales

Nate Sales [email protected] writes:

I have xdp/id:32 on ens5np0, which I've dumped with the following commands. I'm not quite sure what I'm looking for here, although map 11 seems to be problematic.

Depends - what are the ifindexes of the two interfaces you're using? (The number before the : at the start of the link in 'ip link' output).

tohojo avatar Nov 13 '20 18:11 tohojo

Here are my interfaces (index 10 and 11)

10: ens5np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:32 qdisc mq state UP group default qlen 1000
11: ens5np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:40 qdisc mq state UP group default qlen 1000

natesales avatar Nov 13 '20 18:11 natesales

Nate Sales [email protected] writes:

Here are my interfaces (index 10 and 11)

10: ens5np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:32 qdisc mq state UP group default qlen 1000
11: ens5np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:40 qdisc mq state UP group default qlen 1000

Right, so those are in the map (0xa and 0xb). So that's not the issue, then. You can try adding that bpf_printk for fib_params.ifindex before bpf_redirect_map() to see which ifindex is actually returned from the fib lookup...

tohojo avatar Nov 13 '20 19:11 tohojo

Adding a bpf_printk indicates ifindex 11 the vast majority of the time, although occasionally there's an ifindex 5 (a different interface on the system not XDP enabled, used for management)

<idle>-0 [001] ..s. 56297.932388: 0: iface index: 11

natesales avatar Nov 13 '20 19:11 natesales

Nate Sales [email protected] writes:

Adding a bpf_printk indicates ifindex 11 the vast majority of the time, although occasionally there's an ifindex 5 (a different interface on the system not XDP enabled, used for management)

<idle>-0 [001] ..s. 56297.932388: 0: iface index: 11

Right, you're looking for which ifindex results in XDP_ABORTED. So maybe (after the call to bpf_redirect_map()):

if (action == XDP_ABORTED) bpf_printk(...)

tohojo avatar Nov 13 '20 19:11 tohojo

Ah, got it. I've added that check for XDP_ABORTED which resulted in no output so I've added action to the trace output which revealed that all messages have an action of 2. (I believe this is XDP_PASS https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L4266)

natesales avatar Nov 13 '20 19:11 natesales

Nate Sales [email protected] writes:

Ah, got it. I've added that check for XDP_ABORTED which resulted in no output so I've added action to the trace output which revealed that all messages have an action of 2. (I believe this is XDP_PASS https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L4266)

Right so that makes it a bit puzzling where those XDP_ABORTED actions in your stats output are coming from; do you still get those?

tohojo avatar Nov 13 '20 20:11 tohojo

Yes, I'm still getting the XDP_ABORTED actions. Just to make sure I'm logging at the right place, here's the block where I've added the bpf_printk call.

memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
bpf_printk("iface index: %d, action %d\n", fib_params.ifindex, action);
action = bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
# ./xdp_stats -d ens5np0
Collecting stats from BPF map
 - BPF map (bpf_map_type:6) id:42 name:xdp_stats_map key_size:4 value_size:16 max_entries:5
XDP-action  
XDP_ABORTED       114625 pkts (        72 pps)       15618 Kbytes (     0 Mbits/s) period:0.250312
XDP_DROP               0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250307
XDP_PASS          344026 pkts (        88 pps)       26021 Kbytes (     0 Mbits/s) period:0.250309
XDP_TX                 0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250310
XDP_REDIRECT           0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250310
^C
# ./xdp_stats -d ens5np1
Collecting stats from BPF map
 - BPF map (bpf_map_type:6) id:46 name:xdp_stats_map key_size:4 value_size:16 max_entries:5
XDP-action  
XDP_ABORTED          608 pkts (         0 pps)          91 Kbytes (     0 Mbits/s) period:0.250286
XDP_DROP               0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250283
XDP_PASS          303963 pkts (        68 pps)       18662 Kbytes (     0 Mbits/s) period:0.250285
XDP_TX                 0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250285
XDP_REDIRECT           0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250286
^C

natesales avatar Nov 13 '20 20:11 natesales

Nate Sales [email protected] writes:

Yes, I'm still getting the XDP_ABORTED actions. Just to make sure I'm logging at the right place, here's the block where I've added the bpf_printk call.

memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
bpf_printk("iface index: %d, action %d\n", fib_params.ifindex, action);
action = bpf_redirect_map(&tx_port, fib_params.ifindex, 0);

No, you need to switch the last two lines so you're printing 'action' after it's assigned to :)

tohojo avatar Nov 13 '20 21:11 tohojo

Oops, that makes total sense now :) I'm now getting action 4 (XDP_REDIRECT)

<idle>-0 [007] ..s. 63468.817247: 0: iface index: 11, action 4

natesales avatar Nov 13 '20 21:11 natesales

Nate Sales [email protected] writes:

Oops, that makes total sense now :) I'm now getting action 4 (XDP_REDIRECT)

<idle>-0 [007] ..s. 63468.817247: 0: iface index: 11, action 4

Exclusively? You want to try and find those XDP_ABORTED codes; maybe try adding back the if() ?

tohojo avatar Nov 13 '20 21:11 tohojo

Not exclusively - after adding the check for just XDP_ABORTED actions I'm seeing just action 0 lines (as expected).

natesales avatar Nov 13 '20 21:11 natesales

Nate Sales [email protected] writes:

Not exclusively - after adding the check for just XDP_ABORTED actions I'm seeing just action 0 lines (as expected).

That's not expected at all! The stats are showing XDP_ABORTED return codes, so if they're not coming from the bpf_redirect_map() return, where are they coming from?

Could you try running the xdp_monitor example program (from the samples/bpf directory of the kernel sources) and see if that agrees that there are XDP_ABORTED actions happening? And see if it says anything about redirect errors as well...

tohojo avatar Nov 13 '20 22:11 tohojo

Sorry, I think I was unclear. After adding the if statement to only log packets that will be XDP_ABORTED following the bpg_redirect_map call, I now am only seeing lines like this <idle>-0 [007] ..s. 65757.724991: 0: iface index: 11, action 0 in the trace. The absence of XDP_PASS (action 4) trace lines is expected because of the if statement matching only XDP_ABORTED messages.

natesales avatar Nov 13 '20 22:11 natesales

Nate Sales [email protected] writes:

Sorry, I think I was unclear. After adding the if statement to only log packets that will be XDP_ABORTED following the bpg_redirect_map call, I now am only seeing lines like this <idle>-0 [007] ..s. 65757.724991: 0: iface index: 11, action 0 in the trace. The absence of XDP_PASS (action 4) trace lines is expected because of the if statement matching only XDP_ABORTED messages.

Ah, right. Okay, but that still makes no sense; the helper only returns XDP_ABORTED if the map lookup fails:

https://elixir.bootlin.com/linux/v5.4/source/net/core/filter.c#L3745

and we've clearly established than a lookup for ifindex 11 should succeed (and indeed it does at some other times).

Just to be sure that this is the return that's being hit, could you try changing the flags argument to bpf_redirect_map (the last '0') to another return code (such as XDP_DROP) and confirm that the stats counter starts counting that value instead of XDP_ABORTED?

tohojo avatar Nov 13 '20 23:11 tohojo

I've changed the action to XDP_DROP, which does seem to be returning action 1 (XDP_DROP). I'm now using the following code:

action = bpf_redirect_map(&tx_port, fib_params.ifindex, XDP_DROP);

if (action == XDP_ABORTED) {
	bpf_printk("iface index: %d, action %d (ABORTED)\n", fib_params.ifindex, action);
} else {
	bpf_printk("not aborted, action %d\n", action);
}

And getting a bunch of this line in the trace <idle>-0 [009] ..s. 76401.171872: 0: not aborted, action 1

I will note that I'm using the BIRD BGP daemon for this, which is injecting routes into the kernel:

x.x.x.x/x via x.x.x.x dev ens5np0 proto bird metric 32

natesales avatar Nov 14 '20 00:11 natesales

Nate Sales [email protected] writes:

I've changed the action to XDP_DROP, which does seem to be returning action 1 (XDP_DROP). I'm now using the following code:

action = bpf_redirect_map(&tx_port, fib_params.ifindex, XDP_DROP);

if (action == XDP_ABORTED) {
	bpf_printk("iface index: %d, action %d (ABORTED)\n", fib_params.ifindex, action);
} else {
	bpf_printk("not aborted, action %d\n", action);
}

And getting a bunch of this line in the trace <idle>-0 [009] ..s. 76401.171872: 0: not aborted, action 1

And if you print the ifindex here as well, you get that is still the same (11)?

I will note that I'm using the BIRD BGP daemon for this, which is injecting routes into the kernel:

x.x.x.x/x via x.x.x.x dev ens5np0 proto bird metric 32

That shouldn't matter, this is all after the routing lookup which does seem to succeed...

tohojo avatar Nov 16 '20 13:11 tohojo

Yes, I've added the ifindex to the log and it's 11: not aborted, action 1 with ifindex 11

natesales avatar Nov 16 '20 17:11 natesales

Nate Sales [email protected] writes:

Yes, I've added the ifindex to the log and it's 11: not aborted, action 1 with ifindex 11

Right, that implies that redirect to the same interface sometimes works and sometimes doesn't - at the map lookup stage! This seems really odd!

What kind of system are you running this on? Architecture (uname -a) and NIC driver name would be helpful...

tohojo avatar Nov 16 '20 18:11 tohojo

It really is odd. I'll see if I can catch the frame that does result in an ICMP reply, maybe something is different there.

As for the environment, it's an Ubuntu 20.04.1 box with kernel 5.4.0-53-generic and the NIC is a dual port Netronome Agilio with firmware version "0.0.3.5 0.22 bpf-2.0.6.124 ebpf".

natesales avatar Nov 16 '20 18:11 natesales

Nate Sales [email protected] writes:

It really is odd. I'll see if I can catch the frame that does result in an ICMP reply, maybe something is different there.

As for the environment, it's an Ubuntu 20.04.1 box with kernel 5.4.0-53-generic and the NIC is a dual port Metronome Agilio with firmware version "0.0.3.5 0.22 bpf-2.0.6.124 ebpf".

What driver does that use? What's the output of:

ls -l /sys/class/net/enp5np0/device/driver

It's not doing something funny with offloading programs here, is it (seeing as there's 'ebpf' in the firmware version)?

tohojo avatar Nov 16 '20 18:11 tohojo

It is a "SmartNIC", so it has the fancy xdpoffload functionality. Though ip link shows mode "xdp" and not "xdpoffload" so I think it should be just plain XDP on the host system.

# ls -l /sys/class/net/ens5np0/device/driver
lrwxrwxrwx 1 root root 0 Nov 13 03:35 /sys/class/net/ens5np0/device/driver -> ../../../../bus/pci/drivers/nfp

(Edit: markdown)

natesales avatar Nov 16 '20 19:11 natesales

Nate Sales [email protected] writes:

It is a "SmartNIC", so it has the fancy xdpoffload functionality. Though ip link shows mode "xdp" and not "xdpoffload" so I think it should be just plain XDP on the host system.

Yeah, it should be. Which makes this behaviour really, really odd!

Could you try setting up a couple of virtual environments and running the same thing between those (as in the instructions for assignment 2 in the packet03 lesson)? And see if you get the same behaviour there?

tohojo avatar Nov 17 '20 22:11 tohojo

I've tried setting up a test environment, but I'm having some trouble.

# t setup -n left --legacy-ip
Setting up new environment 'left'
Setup environment 'left' with peer ip fc00:dead:cafe:2::2 and 10.11.2.2.
Waiting for interface configuration to settle...

<truncated>

Running ping from inside test environment:

PING fc00:dead:cafe:3::1(fc00:dead:cafe:3::1) 56 data bytes
64 bytes from fc00:dead:cafe:3::1: icmp_seq=1 ttl=64 time=0.039 ms

--- fc00:dead:cafe:3::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms


# t load -n left -- -F --progsec xdp_redirect_map
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
version: 1
<truncated>
libbpf: Error loading .BTF into kernel: -22.
ERR: couldn't find a program in ELF section 'xdp_redirect_map'


# t load -n right -- -F --progsec xdp_redirect_map
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
<truncated>
libbpf: Error loading .BTF into kernel: -22.
ERR: couldn't find a program in ELF section 'xdp_redirect_map'

# t exec -n left -- ./xdp_loader -d veth0 -F --progsec xdp_pass 
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
<truncated>
libbpf: Error loading .BTF into kernel: -22.
ERR: couldn't find a program in ELF section 'xdp_pass'

# t exec -n right -- ./xdp_loader -d veth0 -F --progsec xdp_pass
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
<truncated>
libbpf: Error loading .BTF into kernel: -22.
ERR: couldn't find a program in ELF section 'xdp_pass'

# t redirect right left
WARN: Failed to open bpf map file:/sys/fs/bpf/right/tx_port err(2):No such file or directory

natesales avatar Nov 21 '20 05:11 natesales

Nate Sales [email protected] writes:

I've tried setting up a test environment, but I'm having some trouble.

# t setup -n left --legacy-ip
Setting up new environment 'left'
Setup environment 'left' with peer ip fc00:dead:cafe:2::2 and 10.11.2.2.
Waiting for interface configuration to settle...

<truncated>

Running ping from inside test environment:

PING fc00:dead:cafe:3::1(fc00:dead:cafe:3::1) 56 data bytes
64 bytes from fc00:dead:cafe:3::1: icmp_seq=1 ttl=64 time=0.039 ms

--- fc00:dead:cafe:3::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms


# t load -n left -- -F --progsec xdp_redirect_map
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
version: 1
<truncated>
libbpf: Error loading .BTF into kernel: -22.
ERR: couldn't find a program in ELF section 'xdp_redirect_map'

This sounds like you're just trying to load the wrong file or something?

tohojo avatar Nov 24 '20 14:11 tohojo

Ah yes, xdp_router is the one I'm interested in.

After creating the test environment and loading the programs like so, it does seem to be forwarding between the interfaces.

/home/eo-admin/xdp-tutorial/packet-solutions(master*) # t setup -n alpha --legacy-ip
Setting up new environment 'alpha'
Setup environment 'alpha' with peer ip fc00:dead:cafe:c::2 and 10.11.12.2.
Waiting for interface configuration to settle...

Running ping from inside test environment:

PING fc00:dead:cafe:c::1(fc00:dead:cafe:c::1) 56 data bytes
64 bytes from fc00:dead:cafe:c::1: icmp_seq=1 ttl=64 time=0.028 ms

--- fc00:dead:cafe:c::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.028/0.028/0.028/0.000 ms
/home/eo-admin/xdp-tutorial/packet-solutions(master*) # t setup -n bravo --legacy-ip
Setting up new environment 'bravo'
Setup environment 'bravo' with peer ip fc00:dead:cafe:d::2 and 10.11.13.2.
Waiting for interface configuration to settle...

Running ping from inside test environment:

PING fc00:dead:cafe:d::1(fc00:dead:cafe:d::1) 56 data bytes
64 bytes from fc00:dead:cafe:d::1: icmp_seq=1 ttl=64 time=0.033 ms

--- fc00:dead:cafe:d::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.033/0.033/0.033/0.000 ms
/home/eo-admin/xdp-tutorial/packet-solutions(master*) # t load -n alpha -- -F --progsec xdp_router
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 424
str_off: 424
str_len: 2308
btf_total_size: 2756
[1] PTR (anon) type_id=2
[2] STRUCT xdp_md size=20 vlen=5
	data type_id=3 bits_offset=0
	data_end type_id=3 bits_offset=32
	data_meta type_id=3 bits_offset=64
	ingress_ifindex type_id=3 bits_offset=96
	rx_queue_index type_id=3 bits_offset=128
[3] TYPEDEF __u32 type_id=4
[4] INT unsigned int size=4 bits_offset=0 nr_bits=32 encoding=(none)
[5] FUNC_PROTO (anon) return=6 args=(1 ctx)
[6] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[7] FUNC xdp_router_func type_id=5 vlen != 0

libbpf: Error loading .BTF into kernel: -22.
Success: Loaded BPF-object(xdp_prog_kern.o) and used section(xdp_router)
 - XDP prog attached on device:alpha(ifindex:24)
 - Pinning maps in /sys/fs/bpf/alpha/
/home/eo-admin/xdp-tutorial/packet-solutions(master*) # t load -n bravo -- -F --progsec xdp_router
libbpf: Error loading BTF: Invalid argument(22)
libbpf: magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 424
str_off: 424
str_len: 2308
btf_total_size: 2756
[1] PTR (anon) type_id=2
[2] STRUCT xdp_md size=20 vlen=5
	data type_id=3 bits_offset=0
	data_end type_id=3 bits_offset=32
	data_meta type_id=3 bits_offset=64
	ingress_ifindex type_id=3 bits_offset=96
	rx_queue_index type_id=3 bits_offset=128
[3] TYPEDEF __u32 type_id=4
[4] INT unsigned int size=4 bits_offset=0 nr_bits=32 encoding=(none)
[5] FUNC_PROTO (anon) return=6 args=(1 ctx)
[6] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[7] FUNC xdp_router_func type_id=5 vlen != 0

libbpf: Error loading .BTF into kernel: -22.
Success: Loaded BPF-object(xdp_prog_kern.o) and used section(xdp_router)
 - XDP prog attached on device:bravo(ifindex:25)
 - Pinning maps in /sys/fs/bpf/bravo/
/home/eo-admin/xdp-tutorial/packet-solutions(master*) # t redirect alpha bravo
map dir: /sys/fs/bpf/alpha
redirect from ifnum=24 to ifnum=25
forward: 1e:01:a9:92:0f:82 -> b6:49:90:d4:7a:6e
map dir: /sys/fs/bpf/bravo
redirect from ifnum=25 to ifnum=24
forward: b6:49:90:d4:7a:6e -> 1e:01:a9:92:0f:82

I can ping between the alpha and bravo interfaces:

# t exec -n alpha -- ping fc00:dead:cafe:d::1
PING fc00:dead:cafe:d::1(fc00:dead:cafe:d::1) 56 data bytes
64 bytes from fc00:dead:cafe:d::1: icmp_seq=1 ttl=64 time=0.051 ms
64 bytes from fc00:dead:cafe:d::1: icmp_seq=2 ttl=64 time=0.054 ms
64 bytes from fc00:dead:cafe:d::1: icmp_seq=3 ttl=64 time=0.048 ms
64 bytes from fc00:dead:cafe:d::1: icmp_seq=4 ttl=64 time=0.048 ms

# t exec -n bravo -- ping fc00:dead:cafe:c::1
PING fc00:dead:cafe:c::1(fc00:dead:cafe:c::1) 56 data bytes
64 bytes from fc00:dead:cafe:c::1: icmp_seq=1 ttl=64 time=0.036 ms
64 bytes from fc00:dead:cafe:c::1: icmp_seq=2 ttl=64 time=0.047 ms
64 bytes from fc00:dead:cafe:c::1: icmp_seq=3 ttl=64 time=0.049 ms
64 bytes from fc00:dead:cafe:c::1: icmp_seq=4 ttl=64 time=0.048 ms

When watching the stats on alpha, I can see the XDP_PASS pps go up to 1 when the ICMP packets are being sent.

XDP-action  
XDP_ABORTED            0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:2.000368
XDP_DROP               0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:2.000376
XDP_PASS              29 pkts (         1 pps)           3 Kbytes (     0 Mbits/s) period:2.000375
XDP_TX                 0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:2.000375
XDP_REDIRECT           0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:2.000376

natesales avatar Nov 30 '20 07:11 natesales

Nate Sales [email protected] writes:

Ah yes, xdp_router is the one I'm interested in.

After creating the test environment and loading the programs like so, it does seem to be forwarding between the interfaces.

Looks like you're still using the regular kernel forwarding. You need to load the xdp_router program on the two veth interfaces in the root namespace, not inside the namespaces (since it's in the root that packets are actually forwarded).

tohojo avatar Nov 30 '20 10:11 tohojo

Ah ok, yes that makes sense.

I've torn down the old test environments and created a new one. I now have fc00:dead:cafe:e::1/64 on alpha and fc00:dead:cafe:f::1/64 on bravo.

I then loaded the XDP program on to each interface in (what I hope to be) the root namespace:

# ./xdp_loader -d alpha --progsec xdp_router
<truncated>
Success: Loaded BPF-object(xdp_prog_kern.o) and used section(xdp_router)
 - XDP prog attached on device:alpha(ifindex:26)
 - Pinning maps in /sys/fs/bpf/alpha/

# ./xdp_loader -d bravo --progsec xdp_router
<truncated>
Success: Loaded BPF-object(xdp_prog_kern.o) and used section(xdp_router)
 - XDP prog attached on device:bravo(ifindex:27)
 - Pinning maps in /sys/fs/bpf/bravo/

I then sent a bunch of ICMP traffic from alpha to bravo:

# t exec -n alpha -- ping fc00:dead:cafe:f::1 -A
PING fc00:dead:cafe:f::1(fc00:dead:cafe:f::1) 56 data bytes
64 bytes from fc00:dead:cafe:f::1: icmp_seq=1 ttl=64 time=0.051 ms
64 bytes from fc00:dead:cafe:f::1: icmp_seq=2 ttl=64 time=0.018 ms
64 bytes from fc00:dead:cafe:f::1: icmp_seq=3 ttl=64 time=0.014 ms
<truncated>

but I'm still seeing XDP_PASS

# ./xdp_stats -d alpha
Collecting stats from BPF map
 - BPF map (bpf_map_type:6) id:226 name:xdp_stats_map key_size:4 value_size:16 max_entries:5
XDP-action  
XDP_ABORTED            0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250306
XDP_DROP               0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250305
XDP_PASS          113725 pkts (         0 pps)       13419 Kbytes (     0 Mbits/s) period:0.250307
XDP_TX                 0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250308
XDP_REDIRECT           3 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250308

# ./xdp_stats -d bravo
Collecting stats from BPF map
 - BPF map (bpf_map_type:6) id:230 name:xdp_stats_map key_size:4 value_size:16 max_entries:5
XDP-action  
XDP_ABORTED            0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250285
XDP_DROP               0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250286
XDP_PASS               3 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250288
XDP_TX                 0 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250288
XDP_REDIRECT           8 pkts (         0 pps)           0 Kbytes (     0 Mbits/s) period:0.250288

I might be doing something wrong with loading the program, does this all look right to you?

natesales avatar Dec 03 '20 22:12 natesales