liburing icon indicating copy to clipboard operation
liburing copied to clipboard

Adding test suite for #310 multishot poll / update, tests basic functionality and some error boundary conditions.

Open hassila opened this issue 4 years ago • 14 comments
trafficstars

The small set of tests for #310 merged to one package for easier integration.

One test with an eventfd and multishot poll usage (poll-multiple-update) One test expecting a HUP after other side closes socket (poll-multiple-nohup) One test with ping-pong updates of an existing multishot poll (poll-multiple-update-ping-pong) One test which explores expected failures (ENOENT, EBADF and EINVAL).

These tests needs a new io_uring.h with IORING_POLL_ADD_MULTI, IORING_POLL_UPDATE_EVENTS and IORING_POLL_UPDATE_USER_DATA.

Signed-off-by: Joakim Hassila [email protected]

hassila avatar Mar 22 '21 08:03 hassila

Current output from running is:

Running test poll-multiple-update-ping-pong
Client recevied unexpected POLLIN with no data available
Test poll-multiple-update-ping-pong failed with ret 1
Running test poll-multiple-update
Running test poll-multiple-nohup
Read zero bytes, should have received HUP now too cqe->res [195].
We have received 10 bytes and should be done now, but we have received no POLLHUP, test fail.
Test poll-multiple-nohup failed with ret 1
Running test poll-update-failures

(two currently fails due to issues discussed in #310)

hassila avatar Mar 22 '21 11:03 hassila

This is great, thanks a lot! I'll take a look at this.

axboe avatar Mar 22 '21 19:03 axboe

Hey, we've slightly changed poll updates, now there are IORING_OP_POLL_REMOVE command. Makes much more sense, doesn't require file/fd and will even have less overhead. This won't work with latest 5.13 branch.

See how one test is remade https://github.com/axboe/liburing/commit/7778ffcf7592ca1e7b8abc91beabfe157747283d. There is also a helper added for convenience (see io_uring_prep_poll_update())

isilence avatar Apr 15 '21 12:04 isilence

Ok, so let me know when I can test them ("This won't work with latest 5.13 branch", so what should I use?) :-)

hassila avatar Apr 15 '21 12:04 hassila

Ok, so let me know when I can test them ("This won't work with latest 5.13 branch", so what should I use?) :-)

Apologies, I wasn't clear enough. The change is already in the 5.13 branch. I mean your tests will be broken without converting them as well

isilence avatar Apr 15 '21 12:04 isilence

Ok, will fix them (may be next week, busy schedule this week)...

hassila avatar Apr 15 '21 12:04 hassila

I can fix them up myself if you want, just wanted to make it explicit about this ABI change.

isilence avatar Apr 15 '21 13:04 isilence

I'd also like to clarify that this change is only possible because a kernel hasn't been released with the other ABI yet. Hence there's a chance to improve things now, before it lands in 5.13.

axboe avatar Apr 15 '21 14:04 axboe

@isilence - if you want to fix them up, you are of course welcome, otherwise I'll sort it out next week. Will be happy to try out any further sqpoll fixes too.

hassila avatar Apr 15 '21 14:04 hassila

Ok, I started having a look at this - updated to kernel b2f284ce46c86e9fa309cd3565df8d8599ce5b6b and validated it was running: Linux ip-172-31-25-161 5.12.0-rc7-5.13-uring-210416 #1 SMP Fri Apr 16 19:10:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Updated liburing to 7778ffcf7592ca1e7b8abc91beabfe157747283d and modified the first test similar to your example. Rebuilt liburing and did a make install.

Did a minimalistic change if the ping-pong test, but it now fails with errno 22 (invalid arg) - @isilence, would you mind having a quick look at the diff for https://github.com/axboe/liburing/pull/317/commits/0d6d3beb573c23d62fde568584f8d2602545ac6f and see if it is obviously wrong?

Client POLLOUT on [6] [4] [28]
Client failed to change poll mask for fd [6] [-22] [25]
Server POLLIN on [Server POLLIN on [5] [195] [25]

hassila avatar Apr 17 '21 05:04 hassila

Ok, I started having a look at this - updated to kernel b2f284ce46c86e9fa309cd3565df8d8599ce5b6b and validated it was running: Linux ip-172-31-25-161 5.12.0-rc7-5.13-uring-210416 #1 SMP Fri Apr 16 19:10:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Updated liburing to 7778ffcf7592ca1e7b8abc91beabfe157747283d and modified the first test similar to your example. Rebuilt liburing and did a make install.

Did a minimalistic change if the ping-pong test, but it now fails with errno 22 (invalid arg) - @isilence https://github.com/isilence, would you mind having a quick look at the diff for 0d6d3be https://github.com/axboe/liburing/commit/0d6d3beb573c23d62fde568584f8d2602545ac6f and see if it is obviously wrong?

The last arg, which is 0 in the diff, is what previously was sqe-len, i.e. combination of IORING_POLL_UPDATE_USER_DATA, IORING_POLL_UPDATE_EVENTS, UPDATE_POLL_MULTI_ADD. And have to contain at least one of *UPDATE... flags.

So in this case, pass all 3 as last arg and remove sqe->Len modification at the end

Client POLLOUT on [6] [4] [28]

Client failed to change poll mask for fd [6] [-22] [25] Server POLLIN on [Server POLLIN on [5] [195] [25]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/axboe/liburing/pull/317#issuecomment-821771645, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU4X2HJK55OAZZ7GFP53I3TJEN2JANCNFSM4ZSUUDHQ .

isilence avatar Apr 17 '21 08:04 isilence

Oh. Thanks, I expected it to be SQE flags as it was called "flags" (ie. like for linking). Will fix then, super thank you!

hassila avatar Apr 17 '21 08:04 hassila

Ok, I've been trying to fix this up, but I encounter one funny behaviour. This is a simple ping-pong test with the client thread modifying the poll mask between POLLIN and POLLOUT and alternating between reading/writing from a socket - the 'server thread' is just echoing the data back to the client and closes after N times.

As seen (and mentioned) previously, we get the POLLIN notifications with the result 195 (0xC3) instead of 1, which is an issue in its own.

So the ping pong fails, with some extra debug logging:

...
server write - done [9921]
Client cqe->res [195] cqe->flags [2] cqe->user_data [107374182406]
Client recevied POLLIN with ret [1] [0]
Client cqe->res [0] cqe->flags [0] cqe->user_data [120259084294]
Client cqe->res [4] cqe->flags [2] cqe->user_data [120259084294]
Client POLLOUT on [6] [4] [28]
Client cqe->res [0] cqe->flags [0] cqe->user_data [107374182406]
server write - done [9922]
Client cqe->res [195] cqe->flags [2] cqe->user_data [107374182406]
Client recevied POLLIN with ret [1] [0]
Client cqe->res [0] cqe->flags [0] cqe->user_data [120259084294]
Client cqe->res [4] cqe->flags [2] cqe->user_data [120259084294]
Client POLLOUT on [6] [4] [28]
Client cqe->res [0] cqe->flags [0] cqe->user_data [107374182406]
server write - done [9923]
Client cqe->res [195] cqe->flags [2] cqe->user_data [107374182406]
Client recevied POLLIN with ret [1] [0]
Client cqe->res [0] cqe->flags [0] cqe->user_data [120259084294]
Client cqe->res [4] cqe->flags [2] cqe->user_data [120259084294]
Client POLLOUT on [6] [4] [28]
server write - done [9924]
Client cqe->res [0] cqe->flags [0] cqe->user_data [107374182406]
Client cqe->res [1] cqe->flags [2] cqe->user_data [107374182406] <---------------- ****
Client recevied POLLIN with ret [1] [0]
Client cqe->res [195] cqe->flags [2] cqe->user_data [107374182406]
Client recevied unexpected POLLIN without expecting data, poll_mask [195]

So basically, we get two CQE:s sometimes for one given socket readiness event - one with poll mask "1" (a single correct notification!) and one with poll mask "195" as for all others.

So it seems there is some race condition where we will get two notifications, one of the paths will have the issue with notifying with 0xC3, while the other path notifies with 0x01.

hassila avatar Apr 21 '21 10:04 hassila

Ok, I have updated the tests for the new helper and now have the following output:

...
Running test poll-many
Running test poll-multiple-update-ping-pong
Client received unexpected poll event that we did not ask for [195], expected only poll mask matching [25]
Client recevied unexpected POLLIN with no data available
Test poll-multiple-update-ping-pong failed with ret 1
Running test poll-multiple-update
Running test poll-multiple-nohup
Read zero bytes, should have received HUP now too cqe->res [195].
We have received 10 bytes and should be done now, but we have received no POLLHUP, test fail.
Test poll-multiple-nohup failed with ret 1
Running test poll-update-failures
Running test poll-mshot-update
...

poll-multiple-nohup is possibly incorrect (as Jens tested with system poll and got the same behaviour) and could if viewed as correct behaviour be dropped.

poll-multiple-update-ping-pong also outputs a warning as we get an incorrect poll mask as discussed in #310 (195 instead of 25).

hassila avatar May 03 '21 09:05 hassila