liburing
liburing copied to clipboard
Consider bringing back IORING_CQE_F_MSG
https://github.com/torvalds/linux/commit/7ef66d186eb95f987a97fb3329b65c840e2dc9bf removed IORING_CQE_F_MSG
and I think there could be a reasonable justification for returning it.
I am prototyping a thin event loop library over io_uring which abstract some implementation details and state tracking. Library is responsible for building SQEs and processing CQEs. Users have some state associated with IO request they are making, so library keeps that state and then matches CQEs to it using cqe->user_data
. Nothing unusual.
As an asbstraction layer library assigning sqe->user_data
using own logic without exposing it to the user.
When it comes to IORING_OP_MSG_RING
I am really puzzled how to support it correctly without IORING_CQE_F_MSG
. Sending is not a problem, but receiving CQE out of the blue requires me to understand that it is NOT associated with any SQE.
Given that sqe->user_data
assignment is hidden from the user, user can't know which values to avoid when making IORING_OP_MSG_RING
request.
If IORING_CQE_F_MSG
flag was present in the CQE, then it wont be a problem, library would know exactly that this CQE has no matching SQE previously submitted for it and therefore should be processed differently.
It doesn't solve all problems. Let''s say some library wants to send message requests but also allow the user to send them, then IORING_CQE_F_MSG
doesn't tell much and it'd still need to guess where the CQE originated from.
One way of doing it would be to wrap around that CQE's user_data in another structure and only allow users to send message requests via a wrapper.
Multiple senders of IORING_OP_MSG_RING is a problem on the different level. To me it is not that much different than multiple writers to the same file or unix socket - all readers and writers need to coordinate (even if by convention) how reader can make sense out of what was written.
Problem I am facing right now is much simpler, it is to allow any data to be "written" into a ring without possibility of a clash with internal state of the library managing it. Having a dedicated flag IORING_CQE_F_MSG
in the CQE solves it.
Problem I am facing right now is much simpler, it is to allow any data to be "written" into a ring without possibility of a clash with internal state of the library managing it. Having a dedicated flag
IORING_CQE_F_MSG
in the CQE solves it.
The point is that it only solves one specific problem from many related and the flag has implications of how CQEs are treated, which was the reason it was removed.
Multiple senders of IORING_OP_MSG_RING is a problem on the different level. To me it is not that much different than multiple writers to the same file or unix socket - all readers and writers need to coordinate (even if by convention) how reader can make sense out of what was written.
TL;DR; The second user in the example is the library, which quite makes sense for more complex frameworks, and so the app not only has to choose a format and maintain it but also coordinate it with the library. The suggestion was to use any approach that would solve it for this case.
The second user in the example is the library, which quite makes sense for more complex frameworks
I can't image IO abstraction library or framework which write own data into the sockets they manage for users, they have not to mess with data path to be usable at all. Should there be requirement for framework to communicate via ORING_OP_MSG_RING with another framework instance, they can use their own private io_uring instance for that.
The second user in the example is the library, which quite makes sense for more complex frameworks
I can't image IO abstraction library or framework which write own data into the sockets they manage for users, they have not to mess with data path to be usable at all.
Seems like a misunderstanding, message requests, aka IORING_OP_MSG_RING
, have nothing to do with sockets nor they send/recv any data.
Should there be requirement for framework to communicate via ORING_OP_MSG_RING with another framework instance, they can use their own private io_uring instance for that.
Well, that's one of the main use cases -- efficient communication b/w multiple rings in a multi threaded app, would be a shame forcing them to create a second ring or do weird work arounds.
Not implying that it's your case, but there should be userspace approaches that would work, which is a much better option than limiting the kernel API in the long run. So, what's the problem we're trying to solve? Performance? Do you have a solution for the current API? And if not, why the one described above doesn't work?
I can't image IO abstraction library or framework which write own data into the sockets they manage for users, they have not to mess with data path to be usable at all.
Seems like a misunderstanding, message requests, aka IORING_OP_MSG_RING, have nothing to do with sockets nor they send/recv any data.
I brought up socket to show how inconvenient it would be, to say the least, if IO libraries imposed limits on what is being transferred on the data path. Sure IORING_OP_MSG_RING
is not a socket, but conceptually it is still a data path: sender sends arbitrary bytes for a reader to consume. It is like a fixed size file which can only be overwritten all at once. io_uring privdes various ways for app to communicate data with destination and ORING_OP_MSG_RING
is just another one of them, quite exotic and limited in capabilities, but a data path nevertheless.
Well, that's one of the main use cases -- efficient communication b/w multiple rings in a multi threaded app, would be a shame forcing them to create a second ring or do weird work arounds.
Frankly I don't understand what is the use case for ORING_OP_MSG_RING
, I thought it is more of a way to send small payloads to a forked process or something like that. Surelly there are better ways to communicate 100 bytes or so between threads within the process. My interest in it is purely from perspective of implementing io_uring abstraction - because there is no way to filter/reject incoming ORING_OP_MSG_RING
CQEs I cannot chose not to support this op at least on the receiving side, my library has to cater for those CQEs showing out of nowhere with aribtrary user_data
in them.
but there should be userspace approaches that would work, which is a much better option than limiting the kernel API in the long run. So, what's the problem we're trying to solve?
Main problem is that I need to guess what the cqe->user_data
is rather than just knowing it. Furthemore, with some implementations I can see no way to do it correctly. For instance if abstraction library stores in sqe->user_data
an ever incrementing counter for each SQE it submits, that counter is then used as a key in the hashmap to locate state associated with the operation (buffers, offset , etc), how can library prevent data sent by user with ORING_OP_MSG_RING
from clashing with such key? Every possible value of uint64_t
type can be utilized to track SQE op at some point.
In my prototype I am not storing ever incrementing counter, but I am not storing pointers either, so now I need to segment uint64_t
values space into what library self-allowing to use and what users are allowed to send, which in my eyes imposes unnecessary limitation on implementation details of user-space code, all because kernel misses a flag to mark data explicitly.
I can't image IO abstraction library or framework which write own data into the sockets they manage for users, they have not to mess with data path to be usable at all.
Seems like a misunderstanding, message requests, aka IORING_OP_MSG_RING, have nothing to do with sockets nor they send/recv any data.
I brought up socket to show how inconvenient it would be, to say the least, if IO libraries imposed limits on what is being transferred on the data path. Sure
IORING_OP_MSG_RING
is not a socket, but conceptually it is still a data path: sender sends arbitrary bytes for a reader to consume. It is like a fixed size file which can only be overwritten all at once. io_uring privdes various ways for app to communicate data with destination andORING_OP_MSG_RING
is just another one of them, quite exotic and limited in capabilities, but a data path nevertheless.
That's a bit of a stretch, futex can also transmit data, either via mutexing around a pointer or even in the value it operates on, but as futexes are not considered data path, IORING_OP_MSG_RING
is rather a notification / sync mechanism.
Well, that's one of the main use cases -- efficient communication b/w multiple rings in a multi threaded app, would be a shame forcing them to create a second ring or do weird work arounds.
Frankly I don't understand what is the use case for
ORING_OP_MSG_RING
, I thought it is more of a way to send small payloads to a forked process or something like that. Surelly there are better ways to communicate 100 bytes or so between threads within the process.
In a multi threaded app it's typical (and preferred) to have per-thread rings. IORING_OP_MSG_RING
is useful when the threads are running some kind of io_uring event loop and from time to time need to communicate with each other. There might be other use cases, but that was the main one.
My interest in it is purely from perspective of implementing io_uring abstraction - because there is no way to filter/reject incoming
ORING_OP_MSG_RING
CQEs I cannot chose not to support this op at least on the receiving side, my library has to cater for those CQEs showing out of nowhere with aribtraryuser_data
in them.but there should be userspace approaches that would work, which is a much better option than limiting the kernel API in the long run. So, what's the problem we're trying to solve?
Main problem is that I need to guess what the
cqe->user_data
is rather than just knowing it. Furthemore, with some implementations I can see no way to do it correctly. For instance if abstraction library stores insqe->user_data
an ever incrementing counter for each SQE it submits, that counter is then used as a key in the hashmap to locate state associated with the operation (buffers, offset , etc), how can library prevent data sent by user withORING_OP_MSG_RING
from clashing with such key? Every possible value ofuint64_t
type can be utilized to track SQE op at some point.In my prototype I am not storing ever incrementing counter, but I am not storing pointers either, so now I need to segment
uint64_t
values space into what library self-allowing to use and what users are allowed to send, which in my eyes imposes unnecessary limitation on implementation details of user-space code
Let's try again. Since your library doesn't work with user_data transparently, why not add a helper around IORING_OP_MSG_RING
that would wrap that user_data you're sending in a structure in the same way as done with all other sqe->user_data? Then the library will generically handle it, extract the data and pass it on to the target user.
It should be reasonable to say that a random ring not using your library should not be sending message requests without catering to the format.
all because kernel misses a flag to mark data explicitly.
It's not missing but rather intentionally doesn't provide it and for a good reason.
wrap that user_data you're sending in a structure in the same way as done with all other sqe->user_data
Because use of sqe->user_data
is private to the ring. It is used to store request IDs, which is then in turn used to locate associated state. Request ID is just a plain integer, not a pointer. Regular CQE processing reads user_data, finds state and finishes the request, returning result to user.
When IORING_OP_MSG_RING
operation is invoked on a sender ring, future cqe->user_data
is passed from user. Whole 64 bit of it, I can't wrap it into anything that is guaranteed not to clash with request IDs used by the library on the remote ring.
Maybe my explanation wasn't clear enough, I'll try to be concise to explain why resolving it in user space can be problematic with following design:
- My io_uring library keeps request state using request ID (small integer) as a key
- When submitting SQEs it assigns corresponding request id to the
sqe->user_data
and usescqe->user_data
to lookup state associated state upon completion - I'd like users to be able to utilize full 64 bit of user data in the
IORING_OP_MSG_RING
to allow them sending opaque (to my lib) pointers.
Because users are allowed to send full u64, when my library processes cqe->user_data
it can't know whether it was a request id or IORING_OP_MSG_RING
message from another ring.
Your suggestion was to wrap IORING_OP_MSG_RING
userdata somehow in the userspace, but to me it seems not possible, because:
- pointer can be of any 64bit value, and although it is unlikely to clash with small integer used for request ID by the library, there is no guarantee that it wont.
- only sure way to send
IORING_OP_MSG_RING
safely is to make sender ring allocate request state and request id in the destination ring, but that requires adding multithreading capability to the library,a change I am reluctant to make just to support one op.
I'd be happy to resolve this problem entirely in the user space by some kind of wrapping as you suggested, but I can't see how.
The cqe->flags
format can vary and depends on what kind request produced it, so in general case the user should look at ->user_data first to understand how to parse flags. That's needed for the API extensibility and I don't see it changing.
For the problem at hand, I'd say it should be possible to wrap the target user_data in a struct as mentioned and then solve the remote allocation problem. E.g. set user_data to the pointer instead of an index (and keep the index inside the structure if needed). Then you can do malloc() for msg_ring and you whatever for others. Another version would be to reserve bit 0 of the user_data to indicate whether it's a pointer (i.e. malloc()'ed) or uses another scheme.
does https://github.com/torvalds/linux/commit/cbeb47a7b5f003429ded32b1fb3a7108ce5c1b54 help you?
you can have a magic user_data and then use the result and flags fields to send your 64 bits?
Also I think theres scope to extend the msg_ring to handle CQE32 which would probably also help
This solves problem short term, but it can cause problems down the line if user specified flags start to conflict with newly introduced flags by io_uring. Can io_uring reserve some bits to awlays be user specified to avoid conflict in the future?
since you own the user_data and it is unique for this case, you should not need to worry about extra bits? io_uring will not magically be setting bits without it being asked for.
with IORING_OP_MSG_RING I dont own user data, it is passed as-is from user. there were suggestions to somehow wrap it, but they work if udata is normally a pointer even in the lib itself, it is not in my case, to be able to pass 64 bit from user I have to malloc them and pass a pointer in userdata and then somehow distinguish between pointer or index on the receiving side, like always shifting index 1 bit to the left and use bit 0 as discriminator between index and pointer.
With IORING_MSG_RING_FLAGS_PASS
I can set my own flag, which makes it less awkward to use, but then I need to be sure that flag I set will never be used by io_uring ever, that it is because on the receiving side I'd be checking flag before udata.
Ah yes.
I'm just guessing here, but from your earlier explanation could you have a magic user_data (say 0) for this? you wouldn't need to malloc and it does seem like it would be the same performance.
The user will control res and flags in this case, which you can combine to the 64 bits you need