xdp-tutorial
xdp-tutorial copied to clipboard
advanced03-AF_XDP: performance
So using the tutorial code [1] and the one-liner XDP program which passes all packets through, and the af_xdp_user.c code which just loops through the received packets and then frees the UMEM frame for XDP to use again -- i.e. almost no business logic and the tightest loop in user land dealing with the packets, with no packet inspection, re-writing, or sending -- and using the udpsender code from this article [2], I got the following performance results using veth:
| udpsenders | CPU | poll | mode | PPS | CPU |
| 1 | 100% | no | skb | 995k | 100% |
| 2 | 200% | no | skb | 1.8M | 100% |
| 3 | 300% | no | skb | 2.5M | 100% |
| 4 | 400% | no | skb | 3.2M | 100% |
| 5 | 500% | no | skb | 3.7M | 100% |
| 5 | 500% | yes | skb | 2.8M | 92% |
| 1 | 100% | no | native | 830k | 100% |
| 2 | 123% | no | native | 1.3M | 100% |
| 1 | 100% | no | auto | 815k | 100% |
| 2 | 123% | no | auto | 1.3M | 100% |
| 1 | 100% | yes | auto | 650k | 84% |
| 2 | 126% | yes | auto | 980k | 39% |
I monitored the CPU usage with the following command:
top -d 1 -b | egrep "udpsender|af_xdp_user"
I ran af_xdp_user with the following command variations:
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1 ./udpsender 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2,3 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2,3,4 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2,3,4,5 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321)"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --skb-mode --poll-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2,3,4,5 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321 10.11.11.1:4321)"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --native-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1 ./udpsender 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --native-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --auto-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1 ./udpsender 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --auto-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --auto-mode --poll-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1 ./udpsender 10.11.11.1:4321 )"
$ make && sudo timeout --signal=SIGINT 9 taskset -c 7 ./af_xdp_user --dev veth-advanced03 --filename af_xdp_kern.o --auto-mode --poll-mode --force & sudo ip netns exec veth-advanced03 sh -c "(sleep 1 ; timeout --signal=SIGINT 8 taskset -c 1,2 ./udpsender 10.11.11.1:4321 10.11.11.1:4321 )"
Note: udpsender causes 74 byte packets to be received.
Note: Increased threads for udpsender until no PPS improvement was seen.
Note: The veth is configured with the default single receive queue.
Note: The last column is the CPU usage of af_xdp_user.
Questions:
- The best PPS above is 3.7M. Is there anything I can tweak with
af_xdp_user.cor the environment to get better PPS results? - Should veth be considered faster than hardware NICs or might a hardware NIC (with equivalent single receive queue) be faster? And why?
- Why would using poll not result in 100% CPU usage for
af_xdp_userregardless of how manyudpsenderthreads are used? - Why would using poll with two senders halve CPU usage for
af_xdp_userwhile increasing PPS? This seems counter-intuitive, or? - According to the above results, it seems that when testing performance with veth then skb mode is much faster than native mode, and auto mode chooses native mode. Why is skb mode faster? And if veth is the reason, couldn't auto mode somehow detect that you choose skb?
[1] https://github.com/xdp-project/xdp-tutorial/tree/master/advanced03-AF_XDP [2] https://blog.cloudflare.com/how-to-receive-a-million-packets/
Simon Hardy-Francis [email protected] writes:
Questions:
- The best PPS above is 3.7M. Is there anything I can tweak with
af_xdp_user.cor the environment to get better PPS results?
- Should veth be considered faster than hardware NICs or might a hardware NIC (with equivalent single receive queue) be faster? And why?
XDP gets its high performance by bypassing the networking stack. But when you're using veths, things go through the stack anyway, so you're not really getting the benefits. So I'm not sure how much sense it makes to benchmark this...
- Why would using poll not result in 100% CPU usage for
af_xdp_userregardless of how manyudpsenderthreads are used?- Why would using poll with two senders halve CPU usage for
af_xdp_userwhile increasing PPS? This seems counter-intuitive, or?
IDK, scheduling issues?
- According to the above results, it seems that when testing performance with veth then skb mode is much faster than native mode, and auto mode chooses native mode. Why is skb mode faster? And if veth is the reason, couldn't auto mode somehow detect that you choose skb?
'auto' just means 'try native mode, and if it fails fall-back to skb-mode'. Since veth always builds skbs anyway, 'native' mode just adds overhead (unless the packet was redirected from a physical nic). But that's a specific 'feature' of veth, so not generally applicable...