xdp-tutorial icon indicating copy to clipboard operation
xdp-tutorial copied to clipboard

Hi, I have a question about performance on AMD processors

Open ov3rdr1v337 opened this issue 3 years ago • 6 comments

I'm always running XDP applications on servers with Intel Xeon Gold CPU's, performance was always good and was not a problem - up to 125Mpps with 100 GbE MCX515A-CCAT network card and 2 CPU's inside 1U server.

Today I was trying to make it work on AMD EPYC 7371 and for some reason the performance results were very low — maximum I've got is 13Mpps with the same network card and 1 AMD EPYC 7371. All cores are pushed to their max (100%).

Testing was done on ubuntu 18.04 (and 20.04 as well afterwards). I have installed the same Mellanox drivers as I do for intel (OFED), made other configurations as usual.

I run XDP in Driver mode, JIT enabled. Other tuning as follows:

sudo mlnx_tune -p HIGH_THROUGHPUT sudo ethtool -G enp65s0 rx 512 tx 512 sudo ethtool -L enp65s0 combined 8 sudo ethtool --show-priv-flags enp65s0 sudo ethtool --set-priv-flags enp65s0 rx_cqe_compress on

Is there anything else I should do to run it on AMD CPU? Because performance just can't be so low and I guess I misconfigured something throughout?

We are used 8 rx queues => 8 cpus, and all of them are 100%...

ov3rdr1v337 avatar Nov 18 '20 19:11 ov3rdr1v337

A bit more info about our config: kernel 5.4 for 20.04 and 4.5 for 18.04, we generate traffic from our 2nd machine (TRex) which also has MCX515A-CCAT network card and is connected to 1st machine (AMD one) with Mellanox 100 GbE cable

roddylab avatar Nov 18 '20 20:11 roddylab

Missing DDIO? @netoptimizer any thoughts?

tohojo avatar Nov 18 '20 20:11 tohojo

Missing DDIO? @netoptimizer any thoughts?

Yeah, we have thought about that, but is it possible for the performance to drop by over 100Mpps just because we lack DDIO? Also Cloudflare who use XDP mention that they exchanged their Intel Xeon Platinum 8160 CPU's (2 per unit) to AMD 7742 CPU (1 per unit) and got better results by far.

Thanks for your fast reply btw

roddylab avatar Nov 18 '20 20:11 roddylab

roddy [email protected] writes:

Missing DDIO? @netoptimizer any thoughts?

Yeah, we have thought about that, but is it possible for the performance to drop by over 100Mpps just because we lack DDIO? Also Cloudflare who use XDP mention that they exchanged their Intel Xeon Platinum 8160 CPU's (2 per unit) to AMD 7442 CPU (1 per unit) and got better results by far.

No, it does seem a bit much. You don't mention CPU affinity tuning, did you do any? What does mpstat -P ALL 1 output while you're running your test? I.e., are you seeing activity on all CPUs?

Thanks for your fast reply btw

You're welcome :)

tohojo avatar Nov 18 '20 21:11 tohojo

roddy [email protected] writes:

Missing DDIO? @netoptimizer any thoughts?

Yeah, we have thought about that, but is it possible for the performance to drop by over 100Mpps just because we lack DDIO? Also Cloudflare who use XDP mention that they exchanged their Intel Xeon Platinum 8160 CPU's (2 per unit) to AMD 7442 CPU (1 per unit) and got better results by far. No, it does seem a bit much. You don't mention CPU affinity tuning, did you do any? What does mpstat -P ALL 1 output while you're running your test? I.e., are you seeing activity on all CPUs?

I've wanted to play with the AMD EPYC series CPUs for a while now... this seems like a good opportunity to interact on this. I'm particularly interested in if DDIO / DCA works on AMD. Notice if this tech is not available, then (back before DDIO enabled CPUs) I had software patches to the mlx5 driver, that solved/removed the full cache-miss, via L2-cache prefetching, but we (the kernel) choose not to merge this.

First step to check if any (full) cache-misses occur via below perf stat command:

perf stat -C insert-cpu-number -e cycles -e instructions -e cache-references -e cache-misses -e branches:k -e branch-misses:k -r 4 sleep 1

Remember to change insert-cpu-number to one of the CPUs that is active processing/doing XDP_DROP. I've also included som other stats, that will tell me other things. (Please see if you can fit and format results in a readable fashion via this GitHub interface.)

netoptimizer avatar Nov 19 '20 09:11 netoptimizer

@roddylab / @tohojo writes:

Missing DDIO? @netoptimizer any thoughts?

First step to check if any (full) cache-misses occur via below perf stat command:

Hmm... silence from @ov3rdr1v337 and @roddylab

netoptimizer avatar Dec 15 '20 11:12 netoptimizer