serenity icon indicating copy to clipboard operation
serenity copied to clipboard

Kernel/Ext2FS: Reduce allocations in compute_block_list_impl()

Open brody-qq opened this issue 1 year ago • 5 comments

This PR has commits with the following changes:

  • Reduce the number of kmalloc() an kfree() calls in Ext2FSInode::compute_block_list_impl() by reducing the number of times ByteBuffers are created and destroyed.
  • Add a new test case for writes and reads to Ext2FS files.

brody-qq avatar Jul 15 '24 02:07 brody-qq

More tests is cool.

Saving allocations if it's in service of some other goal (measurably better perf, as part of a project for better memory handling in the kernel, or whatnot) is cool too, but it's not a goal in itself. There's oodles of allocations, and removing 3 one-time allocations doesn't matter, especially if it makes the code longer. In other words: Is this allocating code path hit frequently?

nico avatar Jul 15 '24 12:07 nico

Thanks for looking over my PR.

This isn't part of a bigger project, it's just something I spotted while getting myself up to speed on the filesystem code.

It saves a lot more than 3 allocations. The bigger the file is, the more allocations it will save. The worst case scenario for this code is when the file is the biggest size possible in ext2.

If you unrolll the calls to process_block_array() in the current code, the worst case scenario would look like this: (to keep this short, this is in pseudocode and only focuses on the kmalloc() and kfree() calls made by ByteBuffer)

singly_indirect = kmalloc()
for (i = 0; i < 1024; ++i)
    set_block()
kfree(singly_indirect)

doubly_indirect = kmalloc()
for (i = 0; i < 1024; ++i) {
    singly_indirect = kmalloc()
    for (i = 0; i < 1024; ++i)
        set_block()
    kfree(singly_indirect)
}
kfree(doubly_indirect)

triply_indirect = kmalloc()
for (i = 0; i < 1024; ++i) {
    doubly_indirect = kmalloc()
    for (i = 0; i < 1024; ++i) {
        singly_indirect = kmalloc()
        for (i = 0; i < 1024; ++i)
            set_block()
        kfree(singly_indirect)
    }
    kfree(doubly_indirect)
}
kfree(triply_indirect)

This PR makes the same unrolled worst case scenario look like this:

singly_indirect = kmalloc()
doubly_indirect = kmalloc()
triply_indirect = kmalloc()

for (i = 0; i < 1024; ++i)
    set_block()

for (i = 0; i < 1024; ++i)
    for (i = 0; i < 1024; ++i)
        set_block()

for (i = 0; i < 1024; ++i)
    for (i = 0; i < 1024; ++i)
        for (i = 0; i < 1024; ++i)
            set_block()

kfree(singly_indirect)
kfree(doubly_indirect)
kfree(triply_indirect)

But on the other hand:

  • For common file sizes (a few dozen MBs or less) this PR doesn't reduce allocations much.
  • The worst case scenario I showed above only happens for files that are over 4TB large.
  • This code doesn't run very frequently (some experimentation shows that it runs when you create an ext2 file, on some writes to ext2 files, and on some resizes of ext2 files).

If you don't want the Ext2FSInode changes but want the test case, let me know and I'll make a new PR with just the test case.

brody-qq avatar Jul 16 '24 18:07 brody-qq

I'd happily merge the test 🙂

I'm not sure if the fewer allocations are worth the extra code. Even if it's many allocations for very large files, it's probably still a miniscule amount of total work needed? Again, if there are perf numbers that show that this is beneficial for some work load, happy to merge that part too.

nico avatar Jul 27 '24 14:07 nico

The part about merging everything into a common buffer seems like a no-brainer (assuming everything still works).

Adding some (comparatively) complicated logic just to eliminate up to the last three allocations is another story, and I feel like that might be what thakis is referring to. If it's really just about those code lines we could also move the try_resize call into process_block_array, if I remember correctly that should be sufficiently no-op in case the storage size is already at the specified size.

timschumi avatar Aug 03 '24 18:08 timschumi

Sorry for taking a while, I've been pretty busy the last few weeks.

I put the test case into a separate PR: #24912

Also I made the changes @timschumi suggested, so the logic is cleaner now.

If you still don't want these changes let me know and I'll close the PR (or you can close the PR? I'm still new to contributing to open source so I'm not sure how this is supposed to work lol)

brody-qq avatar Aug 08 '24 16:08 brody-qq

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions!

stale[bot] avatar Aug 30 '24 03:08 stale[bot]