liburing Virtual (non-committed) memory for sparse fixed files

trafficstars

Hello,

If you are writing a web server library — as is the case with me — then you don't really know how many fixed files you need upfront as the application using the library could have only a few connections or possibly hundreds of thousands to millions.

The approach I'm taking is to register a large number of sparse fixed file entries and then update them as I need more.

I tested this approach and registered 1,000,000 sparse fixed file entries and saw memory usage of about ~24MB as a result of this. This is not a lot of memory but still its something — I try to minimise memory usage by my library. Do you think virtual non-committed memory could be used for the sparse entries and have io_uring commit physical pages only when the sparse entries are updated? I'm not sure if this is feasible or not so just throwing this out there

Thanks

Aug 29 '22 11:08 bnbajwa

Hi bnbajwa,

Why not let the applications themselves register fixed files, and the library just do nothing? Is that because you want to hide the io_uring details? I think it's easy to just mark a flag and don't allocate the entries at the beginning, then the first update comes, do the allocation. But since the implementation of fixed files is an array, so allocate entries as needed at each time is hard to achieve. How about maintain an allocation algorithm in your library, e.g. first alloc 100 entries, if it becomes full, expand it to 200 entries.

Aug 30 '22 14:08 HowHsu

Thanks for the response HowHsu!

Why not let the applications themselves register fixed files, and the library just do nothing? Is that because you want to hide the io_uring details?

Yes, I want to hide io_uring details from the application.

I think it's easy to just mark a flag and don't allocate the entries at the beginning, then the first update comes, do the allocation. ... How about maintain an allocation algorithm in your library, e.g. first alloc 100 entries, if it becomes full, expand it to 200 entries.

From checking the liburing code, tests and kernel code, my understanding is that to use multishot accept (direct variant) or other requests using direct descriptors, I need to first register a fixed file table with io_uring using io_uring_register_files or io_uring_register_files_sparse. Both of these functions have unsigned nr_files parameter, and this is how many entries the fixed file table will have once registered. There is no (memory) allocation from my side, io_uring will do the allocation internally.

If I call io_uring_register_files or io_uring_register_files_sparse with unsigned nr_files = 100, I have registered a fixed file table with 100 entries. Then its not possible to expand the fixed file table to 200 entries (i think?) — unless I unregister the fixed file table and re-register with a bigger size. But unregistering the fixed file table involves quiesce.

The only option is to use io_uring_register_files_sparse with a large number of entries, e.g. 1,000,000. I then use an allocation algorithm to allocate free entries in the table for the application. This works fine, 1 million descriptors are enough for almost every application. The only slight issue I had was that If I register a fixed file table with 1,000,000 entries then io_uring makes a large allocation of about ~24MB. Most applications will probably not use that many direct descriptors , so 24MB will be excess memory usage.

But since the implementation of fixed files is an array, so allocate entries as needed at each time is hard to achieve.

I wasn't suggesting allocating memory every time there is an update — what I meant was that, if I call io_uring_register_files_sparse with 1,000,000 entries, then could io_uring allocate the memory for the array (~24MB) as virtual memory only? (i.e. it would still only be 1 memory allocation but no physical pages backing it)

For example, in user space I can do this:

mmap(
    nullptr,
    alloc_size, // e.g. 24MiB
    PROT_READ | PROT_WRITE,
    MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE,
    -1,
    0);

mmap allocates virtual memory but the allocated memory won't have any physical memory pages backing it. So, If I use top command after this allocation, I won't see 24MB being used. The kernel will only back it with physical memory pages when I actually use that memory by reading/writing to it. This way If the application only uses 100,000 entries, then (approximately) only 2.4MB will be used. If 10,000 entries are used, then ~0.24MB will be used etc.

I don't know this is possible/feasible for io_uring to do? It would save memory usage

On a side note, from my experience, io_uring is mostly quite straight forward and easy to use. Dynamic adjustment is the only difficult part with io_uring. For example, I have a similar problem with SQE/CQE ring sizes. If the application sets ring size too small, then there are overflowing problems. The way I plan to deal with that is to start a brand new server process with bigger ring sizes and gracefully shutdown the current one. I may have to use the same approach for this fixed file table size issue but I'd prefer not to as its complicated

Aug 30 '22 22:08 bnbajwa

Thanks for the response HowHsu!

Why not let the applications themselves register fixed files, and the library just do nothing? Is that because you want to hide the io_uring details?

Yes, I want to hide io_uring details from the application.

I think it's easy to just mark a flag and don't allocate the entries at the beginning, then the first update comes, do the allocation. ... How about maintain an allocation algorithm in your library, e.g. first alloc 100 entries, if it becomes full, expand it to 200 entries.

From checking the liburing code, tests and kernel code, my understanding is that to use multishot accept (direct variant) or other requests using direct descriptors, I need to first register a fixed file table with io_uring using io_uring_register_files or io_uring_register_files_sparse. Both of these functions have unsigned nr_files parameter, and this is how many entries the fixed file table will have once registered. There is no (memory) allocation from my side, io_uring will do the allocation internally.

If I call io_uring_register_files or io_uring_register_files_sparse with unsigned nr_files = 100, I have registered a fixed file table with 100 entries. Then its not possible to expand the fixed file table to 200 entries (i think?) — unless I unregister the fixed file table and re-register with a bigger size. But unregistering the fixed file table involves quiesce.

Exactly.

The only option is to use io_uring_register_files_sparse with a large number of entries, e.g. 1,000,000. I then use an allocation algorithm to allocate free entries in the table for the application. This works fine, 1 million descriptors are enough for almost every application. The only slight issue I had was that If I register a fixed file table with 1,000,000 entries then io_uring makes a large allocation of about ~24MB. Most applications will probably not use that many direct descriptors , so 24MB will be excess memory usage.

But since the implementation of fixed files is an array, so allocate entries as needed at each time is hard to achieve.

I wasn't suggesting allocating memory every time there is an update — what I meant was that, if I call io_uring_register_files_sparse with 1,000,000 entries, then could io_uring allocate the memory for the array (~24MB) as virtual memory only? (i.e. it would still only be 1 memory allocation but no physical pages backing it)

For example, in user space I can do this:
mmap(
    nullptr,
    alloc_size, // e.g. 24MiB
    PROT_READ | PROT_WRITE,
    MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE,
    -1,
    0);
mmap allocates virtual memory but the allocated memory won't have any physical memory pages backing it. So, If I use top command after this allocation, I won't see 24MB being used. The kernel will only back it with physical memory pages when I actually use that memory by reading/writing to it. This way If the application only uses 100,000 entries, then (approximately) only 2.4MB will be used. If 10,000 entries are used, then ~0.24MB will be used etc.

I don't know this is possible/feasible for io_uring to do? It would save memory usage

I think it is possible, the files register code currently use kvmalloc which does kmalloc first, that allocates physical pages. And shouldn't be hard to add a flag to indicate that we just need virtual address and then use other allocator rather then kmalloc. I personally think this is proper since this is common issue when leveraging io_uring in a library.

On a side note, from my experience, io_uring is mostly quite straight forward and easy to use. Dynamic adjustment is the only difficult part with io_uring. For example, I have a similar problem with SQE/CQE ring sizes. If the application sets ring size too small, then there are overflowing problems. The way I plan to deal with that is to start a brand new server process with bigger ring sizes and gracefully shutdown the current one. I may have to use the same approach for this fixed file table size issue but I'd prefer not to as its complicated

Aug 31 '22 06:08 HowHsu

I don't know this is possible/feasible for io_uring to do? It would save memory usage

I think it is possible, the files register code currently use kvmalloc which does kmalloc first, that allocates physical pages. And shouldn't be hard to add a flag to indicate that we just need virtual address and then use other allocator rather then kmalloc. I personally think this may be feasible since this is common issue when leveraging io_uring in a library.

Wouldn't is be possible to use xarray instead of a fixed size array?

metze

Aug 31 '22 08:08 metze-samba

Memory allocated by the kernel is always mapped, it's not like the application heap which can be faulted in. No faults can occur on memory allocated on the kernel side. xarray could be used, but it'd be a slowdown for the more normal case of only having a smaller amount of registered files. We could have something where you have a normal array for N, and beyond that you can an xarray. This is what the io_uring kbuf code does to avoid this issue.

Aug 31 '22 14:08 axboe

liburing liburing copied to clipboard

Virtual (non-committed) memory for sparse fixed files

liburing
liburing copied to clipboard