xdp-tutorial icon indicating copy to clipboard operation
xdp-tutorial copied to clipboard

Advanced 03 AF_XDP Invalid Argument

Open eloydegen opened this issue 6 years ago • 22 comments

I'm trying to get the Advanced03_AF_XDP running in Fedora, which runs kernel 5.3.0.rc6.

CONFIG_XDP_SOCKETS=y is correctly configured.

I'm running the following commands as root:

cd advanced03-AF_XPD
make
t setup --name veth-adv03

The last command results in 100% packet loss when running ping. Same happens then for t ping of course.

./af_xdp_user -d veth-adv03

This prints the following error: ERROR: Can't create umem "Invalid argument"

Any clue how I can fix this? I have also tried compiling it on the released VM, but that version did not include Advanced 3 and I'm not able to compile the new code I pulled. I would appreciate any pointer! :)

eloydegen avatar Sep 19 '19 09:09 eloydegen

Eloy [email protected] writes:

I'm trying to get the Advanced03_AF_XDP running in Fedora, which runs kernel 5.3.0.rc6.

CONFIG_XDP_SOCKETS=y is correctly configured.

I'm running the following commands as root:

cd advanced03-AF_XPD
make
t setup --name veth-adv03

The last command results in 100% packet loss when running ping. Same happens then for t ping of course.

./af_xdp_user -d veth-adv03

This prints the following error: ERROR: Can't create umem "Invalid argument"

Any clue how I can fix this?

Hmm, this seems different from the other permission errors we've seen. @chaudron, any idea what's up with this? :)

tohojo avatar Sep 19 '19 12:09 tohojo

I've not seen this before. I assume you use the libbpf from the tutorial, if so can you try the one from your kernel/distribution?

Also, can you try to debug what is failing, as the libbpf API has several failure points, xsk_page_aligned()/mmap()/setsockopt(), etc. etc.

chaudron avatar Sep 19 '19 12:09 chaudron

Oh I should note that this is the beta version of Fedora, but I would argue this is better than the current stable release (kernel 5.0) combined with the mainline kernel.

I Installed a new VM, the ping now works but the second error persist.

I have installed libbpf-devel and pointed the LIBBPF_DIR variable in the /advanced03-AF_XDP/Makefile to /usr/include/bpf, but then it can't build. It does build fine in the default setup.

Creating a printf statement at the top of the main function in af_xdp_user.c doesn't show it, so I'm not sure how to debug this further.

eloydegen avatar Sep 19 '19 13:09 eloydegen

libbpf-devel package in Fedora does not include all the files that are currently in the /libbpf/src folder, they come from the Linux kernel source. I pointed to Makefile variable to the folder in the Linux source and compiling works. Running t ping again results in 100% packet loss.

eloydegen avatar Sep 20 '19 09:09 eloydegen

Eloy [email protected] writes:

libbpf-devel package in Fedora does not include all the files that are currently in the /libbpf/src folder, they come from the Linux kernel source.

The libbpf-devel package is supposed to contain everything. If it doens't, please file a bug (although I think there may be a new version of the libbpf package coming soon, so it may fix itself at that point).

I pointed to Makefile variable to the folder in the Linux source and compiling works. Running t ping again results in 100% packet loss.

Are you seeing any output from the af_xdp_user command? You're not actually supposed to get any ping replies while running the initial example...

tohojo avatar Sep 20 '19 09:09 tohojo

You're not actually supposed to get any ping replies while running the initial example...

Oh. The first time I ran it on Ubuntu, the ping worked. Interesting.

The invalid argument is coming from an munmap syscall, but I'm still clueless what the actual problem is. I have attached the strace log

eloydegen avatar Sep 20 '19 10:09 eloydegen

Eloy [email protected] writes:

You're not actually supposed to get any ping replies while running the initial example...

Oh. The first time I ran it on Ubuntu, the ping worked. Interesting.

The invalid argument is coming from an munmap syscall, but I'm still clueless what the actual problem is. I have attached the strace log

No, I think it's coming from the preceding mmap:

mmap(NULL, 8374384, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, 3, 0x180000000) = -1 EINVAL (Invalid argument)

The munmap is libbpf's attempt at cleaning up in the error path (which also fails for some reason).

Looking at the kernel code, I guess it's either failing this check:

if (size > (PAGE_SIZE << compound_order(qpg)))
	return -EINVAL;

or remap_pfn_range() returns -EINVAL. Which can also happen, I guess, but not sure if it is in this case.

Hmm, maybe try making the buffer smaller? Just decrease NUM_FRAMES at the top of af_xdp_user.c and recompile...

tohojo avatar Sep 20 '19 11:09 tohojo

No, I think it's coming from the preceding mmap:

Missed that one, it's the earliest error indeed.

Hmm, maybe try making the buffer smaller? Just decrease NUM_FRAMES at the top of af_xdp_user.c and recompile...

I decreased it to 64 from the original 4096, still the same error.

I compiled it with the Linux mainline source as well with the library code in this repository, that doesn't make a difference.

eloydegen avatar Sep 20 '19 11:09 eloydegen

Eloy [email protected] writes:

No, I think it's coming from the preceding mmap:

Missed that one, it's the earliest error indeed.

Hmm, maybe try making the buffer smaller? Just decrease NUM_FRAMES at the top of af_xdp_user.c and recompile...

I decreased it to 64 from the original 4096, still the same error.

I compiled it with the Linux mainline source as well with the library code in this repository, that doesn't make a difference.

Hmm, right, that's odd. No idea what's failing now. I'll try to ping some of the upstream AF_XDP devs and point them here, let's see if they have any ideas...

tohojo avatar Sep 20 '19 11:09 tohojo

Thanks! I just subscribed to xdp-newbies and bpf on the Linux Kernel mailinglist, so I hope you're sending it there.

eloydegen avatar Sep 20 '19 13:09 eloydegen

You are likely running a too new libbpf on an older kernel. In 5.4-rcX, there is a new feature that changes the size of the offset struct. An old libbpf or app can run on any kernel, but a new libbpf cannot run on an old kernel. Something that should be supported? In the mean time, just use an older libbpf (from 5.3), or a newer kernel :-).

magnus-karlsson avatar Sep 20 '19 13:09 magnus-karlsson

Actually, this should be fixed in libbpf. Will submit a patch. Thanks for detecting this.

magnus-karlsson avatar Sep 20 '19 13:09 magnus-karlsson

Magnus Karlsson [email protected] writes:

You are likely running a too new libbpf on an older kernel. In 5.4-rcX, there is a new feature that changes the size of the offset struct. An old libbpf or app can run on any kernel, but a new libbpf cannot run on an old kernel. Something that should be supported? In the mean time, just use an older libbpf (from 5.3), or a newer kernel :-).

Wait, isn't libbpf supposed to be backwards-compatible with older kernels as well?

tohojo avatar Sep 20 '19 13:09 tohojo

Magnus Karlsson [email protected] writes:

Actually, this should be fixed in libbpf. Will submit a patch. Thanks for detecting this.

Great, thanks!

tohojo avatar Sep 20 '19 13:09 tohojo

Magnus Karlsson [email protected] writes: You are likely running a too new libbpf on an older kernel. In 5.4-rcX, there is a new feature that changes the size of the offset struct. An old libbpf or app can run on any kernel, but a new libbpf cannot run on an old kernel. Something that should be supported? In the mean time, just use an older libbpf (from 5.3), or a newer kernel :-). Wait, isn't libbpf supposed to be backwards-compatible with older kernels as well?

Do not know. I just thought about all the support tickets I would get if I do not fix this right now :-).

magnus-karlsson avatar Sep 20 '19 13:09 magnus-karlsson

Magnus Karlsson [email protected] writes:

Do not know. I just thought about all the support tickets I would get if I do not fix this right now :-).

Hehe, right. Well, we're just going to keep reporting any compatibility issues to you so you also have to deal with those, then ;)

tohojo avatar Sep 20 '19 13:09 tohojo

Has the patch been submitted already, so I can try to build it again? Or does it need more time?

eloydegen avatar Sep 23 '19 08:09 eloydegen

On Mon, Sep 23, 2019 at 10:30 AM Eloy [email protected] wrote:

Has the patch been submitted already, so I can try to build it again? Or does it need more time?

It needs more time since I am travelling to Kernel Recipes this week. I will let you know as soon as it is finished.

/Magnus

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xdp-project/xdp-tutorial/issues/78?email_source=notifications&email_token=AASGUEJK6WOG3BVK6742GU3QLB5CBA5CNFSM4IYI6V52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7KENGA#issuecomment-534005400, or mute the thread https://github.com/notifications/unsubscribe-auth/AASGUEJDZNQQS4GFHIIVHETQLB5CBANCNFSM4IYI6V5Q .

magnus-karlsson avatar Sep 23 '19 08:09 magnus-karlsson

Thanks for the quick response, I will await it.

eloydegen avatar Sep 23 '19 08:09 eloydegen

Eloy,

Could you please provide me with your full name and mail address? I would like to give you credit on the patch with a Reported-by tag as you found this issue.

magnus-karlsson avatar Sep 30 '19 14:09 magnus-karlsson

Yes, that is Eloy Degen [email protected]

Thanks for the fix and attribution.

eloydegen avatar Sep 30 '19 14:09 eloydegen

Sent you a patch that it would be great if you could try out. Note that samples/bpf does not build at the moment in bpf/master, so I applied the patch to an old need_wakeup development branch, then launched a standard Linux 5.3 that does not have need_wakeup support. The sample/libbpf compiled with need_wakeup runs as expected on that kernel without the support.

magnus-karlsson avatar Oct 01 '19 07:10 magnus-karlsson