onload
onload copied to clipboard
ixgbe: __oof_socket_add_wild: 1:2047 ERROR: FILTER TCP 10.194.139.66:1 0.0.0.0:0 failed
I can not start onload in centos 8.3 and debian10(kernel 5.8), why onload need to find mtdchar? errors follows:
[root@localhost onload]# "$(mmaketool --toppath)/build/$(mmaketool --driverbuild)/driver/linux/load.sh" onload
unload.sh: /sbin/rmmod onload
unload.sh: /sbin/rmmod sfc_char
unload.sh: /sbin/rmmod sfc_resource
unload.sh: /sbin/rmmod sfc
unload.sh: /sbin/rmmod virtual_bus
unload.sh: /sbin/rmmod sfc_driverlink
NET_OPT is
CHAR_OPT is
modprobe: FATAL: Module mtdchar not found in directory /lib/modules/4.18.0-240.10.1.el8_3.x86_64
ERROR: Did not find sfc_control in /proc/devices
sfc is a DEBUG driver
RESOURCE_OPT is
CHAR_OPT is
ONLOAD_OPT is
It doesn't need mtdchar really, for the most of kernels. What do you mean when you say "I can not start Onload"?
I agree that it's better to fix load.sh
to avoid printing unimportant errors, but it is a developer's tool. Do you see any real issue with Onload? Which application do you use? Does it work with Onload?
Thanks for your reply! you mean the errors when load drivers donot matters? The situation is: when loading drivers using "load.sh onload", I found errors that I memtioned last time:
modprobe: FATAL: Module mtdchar not found in directory /lib/modules/4.18.0-240.10.1.el8_3.x86_64
ERROR: Did not find sfc_control in /proc/devices
then I try to use onload lib by call "scripts/onload" or "LD_PRELOAD" to speed up my app , it faileds! logs follow:
ssj@ssj-debian10:~/github/sutn$ LD_PRELOAD="$(mmaketool --toppath)/build/$(mmaketool --userbuild)/lib/transport/unix/libcitransport0.so" sockperf sr -i 192.168.56.102 -p 1233 --tcp
citp_oo_get_cpu_khz: Failed to open /dev/onload
oo:sockperf[18408]: __citp_netif_alloc: failed to open driver (1)
oo:sockperf[18408]: citp_netif_alloc_and_init: failed to create netif (1)
oo:sockperf[18408]: citp_tcp_socket: failed (errno:1) - PASSING TO OS
oo_onloadfs_dev_t: Failed to open /dev/onload
sockperf: == version #3.7-1.gitb741ab3c60b1 ==
sockperf: [SERVER] listen on:
[ 0] IP = 192.168.56.102 PORT = 1233 # TCP
os version
root@ssj-debian10:/home/ssj/github/onload# uname -a
Linux ssj-debian10 5.8.0-0.bpo.2-amd64 #1 SMP Debian 5.8.10-1~bpo10+1 (2020-09-26) x86_64 GNU/Linux
or
[root@bogon dev]# uname -a
Linux bogon 4.18.0-240.10.1.el8_3.x86_64 #1 SMP Mon Jan 18 17:05:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@bogon dev]# cat /etc/centos-release
CentOS Linux release 8.3.2011
[root@bogon dev]#
I found onload the day before yesterday, I want to test the effect by "sockperf", then I would use it in distributed systems to lower the cost in network, lower latency and enhance the throutput! I need your help. Thank you very much!
Are you using Solarflare NICs? Or AF_XDP? Have you read https://github.com/Xilinx-CNS/onload#installation-and-quick-start-guide:
echo ens2f0 > /sys/module/sfc_resource/afxdp/register
Are you using Solarflare NICs? Or AF_XDP? Have you read https://github.com/Xilinx-CNS/onload#installation-and-quick-start-guide:
echo ens2f0 > /sys/module/sfc_resource/afxdp/register
Thanks for your reply. We don not use Solarflare NICs, we just want to test AF_XDP.
1、Yes, I have excuted this cmd. echo ens2f0 > /sys/module/sfc_resource/afxdp/register
2、The infomation of my env is as follows. If onload can run on this device ? If yes, what steps should I obey ? ( 1 ) NIC: [root@A03-R05-I139-66-FVP3HP2 ~]# ethtool -i eth0 driver: ixgbe version: 5.1.0-k-rh8.2.0 firmware-version: 0x8000090c, 18.3.6 expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes ( 2 ) OS [root@A03-R05-I139-66-FVP3HP2 ~]# uname -r 4.18.0-240.10.1.el8_3.x86_64 [root@A03-R05-I139-66-FVP3HP2 ~]# cat /etc/centos-release CentOS Linux release 8.2.2004 (Core)
Any reply will be greatly appreciated. Thank you very much.
- Did onload module loaded? Let me repeat, the complains from
load.sh
are non-fatal. Please sharelsmod | grep sfc
&lsmod |grep onload
. - What do you see in dmesg?
- (if onload module has been loaded) Is
/dev/onload
here?load.sh
usually creates it.
1. Did onload module loaded? Let me repeat, the complains from `load.sh` are non-fatal. Please share `lsmod | grep sfc` & `lsmod |grep onload`. 2. What do you see in dmesg? 3. (if onload module has been loaded) Is `/dev/onload` here? `load.sh` usually creates it.
Thank you for reply. Yes, we now do not care 'load.sh'.
-
Information of
lsmod | grep sfc
&lsmod |grep onload
are as follow. Is there any thing wrong ? [root@A03-R05-I139-66-FVP3HP2 onload]# lsmod | grep sfc sfc_char 106496 1 onload sfc_resource 180224 2 onload,sfc_char sfc 524288 0 virtual_bus 16384 1 sfc sfc_driverlink 16384 2 sfc,sfc_resource vdpa 16384 1 sfc mtd 69632 1 sfc mdio 16384 2 sfc,ixgbe [root@A03-R05-I139-66-FVP3HP2 onload]# lsmod | grep onload onload 794624 4 sfc_char 106496 1 onload sfc_resource 180224 2 onload,sfc_char [root@A03-R05-I139-66-FVP3HP2 onload]# -
What do you see in dmesg ? NDEBUG: [ 1365.686192] [onload] [1]: WARNING: huge pages are incompatible with AF_XDP. Disabling hugepage support. [ 1365.716153] [onload] __oof_socket_add_wild: 1:2047 ERROR: FILTER TCP 10.194.139.66:1 0.0.0.0:0 failed (-95) [ 1365.718787] [sfc efhw] af_xdp_flush_rx_dma_channel: FIXME AF_XDP [ 1365.719212] [sfc efhw] af_xdp_flush_tx_dma_channel: FIXME AF_XDP DEBUG: [ 9653.298932] [onload] [6]: WARNING: huge pages are incompatible with AF_XDP. Disabling hugepage support. [ 9653.347207] [onload] __oof_socket_add_wild: 6:2047 ERROR: FILTER TCP 10.194.139.67:1500 0.0.0.0:0 failed (-95) [ 9653.360091] [sfc efrm] efrm_pt_flush: [rs:0,00000000dbd0b04d] EVQ=2048 TXQ=512 RXQ=512 [ 9653.360094] [sfc efrm] __efrm_vi_resource_issue_flush: rx queue 0 flush requested for nic 0 [ 9653.360096] [sfc efhw] af_xdp_flush_rx_dma_channel: FIXME AF_XDP [ 9653.365175] [sfc efrm] Flushed queue nic 0 type 1 0x0 rc -95 [ 9653.365197] [sfc efrm] __efrm_vi_resource_issue_flush: tx queue 0 flush requested for nic 0 [ 9653.365198] [sfc efhw] af_xdp_flush_tx_dma_channel: FIXME AF_XDP [ 9653.370204] [sfc efrm] Flushed queue nic 0 type 0 0x0 rc -95 [ 9653.370252] [sfc efrm] efrm_vi_rm_delayed_free: 00000000f515d7eb [ 9653.370253] [sfc efrm] efrm_vi_rm_delayed_free: flushed VI instance=0 [ 9653.370295] [sfc efrm] efrm_vi_rm_free_flushed_resource: [rs:0,00000000dbd0b04d] [ 9653.370296] [sfc efrm] __efrm_vi_resource_free: Freeing 0 [ 9653.370320] [sfc efrm] Flushed queue nic 0 type 2 0x0 rc 0
-
(if onload module has been loaded) Is
/dev/onload
here?load.sh
usually creates it. ---- Yes, onload is here. [root@A03-R05-I139-66-FVP3HP2 onload]# ls /dev/ | grep onload onload onload_epoll [root@A03-R05-I139-66-FVP3HP2 onload]#
Any reply will be greatly appreciated.
@ol-alexandra Hi, can you reproduce this issue in your local env ? Thank you.
[ 9653.298932] [onload] [6]: WARNING: huge pages are incompatible with AF_XDP. Disabling hugepage support. [ 9653.347207] [onload] __oof_socket_add_wild: 6:2047 ERROR: FILTER TCP 10.194.139.67:1500 0.0.0.0:0 failed (-95) [ 9653.360091] [sfc efrm] efrm_pt_flush: [rs:0,00000000dbd0b04d] EVQ=2048 TXQ=512 RXQ=512 [ 9653.360094] [sfc efrm] __efrm_vi_resource_issue_flush: rx queue 0 flush requested for nic 0 [ 9653.360096] [sfc efhw] af_xdp_flush_rx_dma_channel: FIXME AF_XDP [ 9653.365175] [sfc efrm] Flushed queue nic 0 type 1 0x0 rc -95 [ 9653.365197] [sfc efrm] __efrm_vi_resource_issue_flush: tx queue 0 flush requested for nic 0 [ 9653.365198] [sfc efhw] af_xdp_flush_tx_dma_channel: FIXME AF_XDP [ 9653.370204] [sfc efrm] Flushed queue nic 0 type 0 0x0 rc -95
No, I can not reproduce it because I do not have ixgbe NICs.
No, I can not reproduce it because I do not have ixgbe NICs.
Ok, I know.
1、Shall you give me some advices to solve this problem ?
2、Do you know who run onload on ixgbe successful ?
Thanks.
Hi, Onload with non-Solarflare NICs is a community-supported capability. Since you are experiencing an error related to filtering I think you need to follow up this suggestion further: https://github.com/Xilinx-CNS/onload/issues/10#issuecomment-785929182
Hi, Onload with non-Solarflare NICs is a community-supported capability. Since you are experiencing an error related to filtering I think you need to follow up this suggestion further: #10 (comment)
Yes, I have also noticed this issue, but i did not understand what did @maciejj-xilinx mean for "ethtool --features enp4s0f0 ntuple".
Yes, I have also noticed this issue, but i did not understand what did @maciejj-xilinx mean for "ethtool --features enp4s0f0 ntuple".
When running on non-Solarflare NICs, Onload relies on the NIC's driver supporting ntuple filters. The Intel driver supports enabling this through the ethtool command that is commonly used to configure network interface properties. By running the ethtool command given, using the correct network interface name for your system in place of enp4s0f0, you can turn on Intel's ntuple filtering support in their driver.
Yes, I have also noticed this issue, but i did not understand what did @maciejj-xilinx mean for "ethtool --features enp4s0f0 ntuple".
When running on non-Solarflare NICs, Onload relies on the NIC's driver supporting ntuple filters. The Intel driver supports enabling this through the ethtool command that is commonly used to configure network interface properties. By running the ethtool command given, using the correct network interface name for your system in place of enp4s0f0, you can turn on Intel's ntuple filtering support in their driver.
It is ok, thank you very much.
cmd: ethtool -K eth0 ntuple on
Hi,
I'm seeing the same/similar error here using an igb
driver card on kernel 5.11. It got AF_XDP support in 5.10 (see here here).
My interface is added to /sys/module/sfc_resource/afxdp/register
and I've turned on the ntuple flag for the interface.
If I try to run e.g. sudo ./onload nc -l 9898
I see:
↳ sudo ./onload nc -l 9898
oo:nc[2289307]: Using Onload 20210611 [2]
oo:nc[2289307]: Copyright 2019-2021 Xilinx, 2006-2019 Solarflare Communications, 2002-2005 Level 5 Networks
nc: listen: Invalid argument
And syslog shows:
Jun 11 11:10:58 monster kernel: [onload] [2]: WARNING: huge pages are incompatible with AF_XDP. Disabling hugepage support.
Jun 11 11:10:58 monster kernel: [onload] __oof_socket_add_wild: 2:2047 ERROR: FILTER TCP 10.10.10.11:9898 0.0.0.0:0 failed (-22)
Jun 11 11:10:58 monster kernel: [sfc efhw] af_xdp_flush_rx_dma_channel: FIXME AF_XDP
Jun 11 11:10:58 monster kernel: [sfc efhw] af_xdp_flush_tx_dma_channel: FIXME AF_XDP
If I try to run instead iperf
it also reports a similar error (but does not exit, it is not possible to connect to it):
↳ sudo ./onload iperf -s
oo:iperf[2290537]: Using Onload 20210611 [3]
oo:iperf[2290537]: Copyright 2019-2021 Xilinx, 2006-2019 Solarflare Communications, 2002-2005 Level 5 Networks
listen failed: Invalid argument
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
the syslog is similar:
Jun 11 11:12:30 monster kernel: [onload] [3]: WARNING: huge pages are incompatible with AF_XDP. Disabling hugepage support.
Jun 11 11:12:30 monster kernel: [onload] __oof_socket_add_wild: 3:2047 ERROR: FILTER TCP 10.10.10.11:5001 0.0.0.0:0 failed (-22)
My kernel version:
Linux monster 5.11.22-2-MANJARO #1 SMP PREEMPT Fri May 21 17:45:54 UTC 2021 x86_64 GNU/Linux
running from git repo as of sha 4267b166ea37d4d780160003a65029422fbd476a
.
Happy to assist in any further information gathering to help figure out what's going on!
Hello sundbp,
thanks for detailed report.
We have not tested yet AF_XDP with 5.10 or 5.11 yet, but we do not know any reason AF_XDP with ixdbe would fail there.
There is one thing worth checking.
With ixgbe devices there are some restrictions - e.g. only one filter type can get installed on the NIC and presence of one type of filters will prevent other types of filters to be inserted.
I was wondering what would be the outcome of attempting to manually insert a filter of the type that Onload uses. This should be the command line to achieve this:
sudo ethtool -U ethX flow-type tcp4 dst-ip 10.10.10.11 dst-port 9898 action 1
Also are there any filters installed on the NIC? This could be listed with:
sudo ethtool -u ethX
What is the outcome of running the command?
We have not tested yet AF_XDP with 5.10 or 5.11 yet,
We did. It works. But we tested with SFC NICs only (which is completely useless from any normal user point of view).
I'm seeing the same/similar error here using an igb driver card on kernel 5.11.
Just noticed that you actually mentioned igb
driver.
With Intel we have tested with ixgbe
but not igb
.
The support for ntuple filters on igb
devices might be limited or non-existent.
Worth checking feature list:
ethtool --show-features ethX| grep ntuple
and the filter insertion command suggested above to establish whether device support for ntuple is at required level.
sudo ethtool -u enp68s0
2 RX rings available
Total 0 rules
And:
sudo ethtool --show-features enp68s0| grep ntuple
ntuple-filters: on
This is more interesting:
↳ sudo ethtool -U enp68s0 flow-type tcp4 dst-ip 10.10.10.11 dst-port 9898 action 1
rmgr: Cannot insert RX class rule: Invalid argument
I don't see anything syslog.
I found this: https://software.intel.com/content/www/us/en/develop/articles/setting-up-intel-ethernet-flow-director.html
Suggests that perhaps the output from ethtool saying it's available and on is false?
Datasheet of relevance with a flow director section - can't read if it is enough or not: https://cdrdv2.intel.com/v1/dl/getContent/333017