onload
onload copied to clipboard
Onload AF_XDP degraded latency on AWS
I tested out onload AF_XDP latency on AWS machine(m5zn.12xlarge). Initially I had some issues related to setting it up. After building and installing I registered device using this command:
echo ens5 |sudo tee /sys/module/sfc_resource/afxdp/register
But whenever I ran any application using onload, I encountered this error:
oo:nc.openbsd[1172]: netif_tcp_helper_alloc_u: ENODEV. This error can occur if:
- no Solarflare network interfaces are active/UP, or they are running packed stream firmware or are disabled, and
- there are no AF_XDP interfaces registered with sfc_resource Please check your configuration.
Later on I was able to debug and fix this by running these two commands:
sudo ifconfig ens5 mtu 3000
sudo ethtool -L ens5 combined 1
Afterwards I measured the latency with and without onload afxdp using a simple ping pong application that also takes timestamp when the packet is returned. The measurements are taken on two machines in same placement group, over 100000 samples, twice and the application runs on UDP protocol. Here are the results:
kernel | |||
---|---|---|---|
latency_mean: 43.626 us | latency_mean: 43.512 us | ||
latency_min: 38.579 us | latency_min: 38.687 us | ||
latency_max: 136.695 us | latency_max: 139.111 us | ||
latency_median: 42.735 us | latency_median: 42.865 us | ||
latency_std_dev: 5.122 us | latency_std_dev: 4.433 us |
onload af_xdp | |||
---|---|---|---|
latency_mean: 50.358 us | latency_mean: 50.581 us | ||
latency_min: 42.985 us | latency_min: 44.284 us | ||
latency_max: 240.949 us | latency_max: 192.622 us | ||
latency_median: 47.883 us | latency_median: 47.931 us | ||
latency_std_dev: 7.539 us | latency_std_dev: 7.025 us |
Degraded latency is observed with onload af_xdp.
Additionally I also measured throughput using linux iperf tool on UDP. Throughput improved only marginally:
setup | throughput Gbits/sec |
---|---|
kernel | 6.96 |
onload | 7.24 |
These results are contrary to my expectations and I have a few questions:
- Does virtualization negatively effect the performance of onload af_xdp? here better results in terms of throughput are recorded
- Does onload afxdp work better on some cards as compared to others?
- Is there any further setting required to get better results?
In a sense, iperf tool is not compatible with Onload. iperf uses 2 separate threads for read and write. For Onload it means lock contention. The "better results" you mention are for memcached.
Latency is not expected to benefit from AF_XDP Onload at all. I.e. your results match my expectations.
Thank you for your response. Can you recommend some tool for throughput measurement instead of iperf, through which I would be able to see best results for onload af_xdp?
I can recommend something like memcached. Or may be nginx (caution: Onload nginx profile needs "clustering"; I saw some recent commits here in the master branch of Onload which add "clustering" support to AF_XDP but I did not try it).
AF_XDP Onload provides some benefit for an application which uses epoll to manage a lot of sockets.
Does anyone have any advice for tuning onload on ena for custom applications?