zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Linux: O_TMPFILE and insert inode hash rework

Open snajpa opened this issue 1 year ago • 10 comments

Motivation and Context

Byproduct of investigation done for https://github.com/openzfs/zfs/issues/16608

Description

This change:

  • reworks inode hash insert so that its in topmost layer and with inode unlock in the same order as other Linux FS implementations do it

  • reworks O_TMPFILE creation to use new linux specific zp->z_is_tmpfile, this fixes a potential unchecked error return value of zpl_xattr_security_init

  • exchanges truncate_setsize for truncate_inode_pages_final in zpl_evict_inode to follow vfs.rst's "the method has to use truncate_inode_pages_final()"

How Has This Been Tested?

Localy during development and via the bots also, prod use already

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Performance enhancement (non-breaking change which improves efficiency)
  • [x] Code cleanup (non-breaking change which makes code smaller or more readable)
  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • [ ] Documentation (a change to man pages or other documentation)

Checklist:

  • [x] My code follows the OpenZFS code style requirements.
  • [ ] I have updated the documentation accordingly.
  • [x] I have read the contributing document.
  • [ ] I have added tests to cover my changes.
  • [ ] I have run the ZFS Test Suite with this change applied.
  • [x] All commit messages are properly formatted and contain Signed-off-by.

snajpa avatar Nov 16 '24 21:11 snajpa

@snajpa is this still to be split into separate PRs? If they're done, can you link them please? ty!

robn avatar Apr 23 '25 00:04 robn

@snajpa is this still to be split into separate PRs? If they're done, can you link them please? ty!

I'd rather not if I don't have to, I would still like to get all of those changes merged, if there are any cosmetics, I could sort that right here - but I would like to keep the zhold change Brian pointed out, if I may, if my reasoning makes sense there (don't insist on it, just asking for it to be considered, helps me to quickly identify the outlier case where the return value is NULL)

Don't insist on anything in particular, can put in more work, but is there anyone to review 4 new patches, I was waiting before #16788 gets merged (or some action) before tending to this one as that one is more pressing IMO

(this one is deployed in prod @ our humble nonprofit infra where it gets bombed by thousands of mount namespaces and fun evictions triggered from various sources sometimes in mad loops too :D)

snajpa avatar Apr 23 '25 00:04 snajpa

Have to budge on that igrab/zhold thing :D

After playing around with multiple variants to make it easier to search for znode/inode reference increases/decreases - the most promising was when I tried to define zhold_inode which would take ip, also zhold_noverify and zhold_inode_noverify - but the result wasn't as nice as I would hope for, it would require modifying more comments and those comments would then lose the clarity they currently have. It'd be nice to be able to search the codebase for zhold and zrele only, but it would add another level of indirection when trying to understand the code from scratch, which I think has priority over my being to navigate it easier when I already know I have to look for both igrab and zhold (and iput and zrele respectively.

snajpa avatar Apr 30 '25 14:04 snajpa

@robn what do you think, should I split this into more PRs or do you think you can review it wholesale as is? If you're up for reviewing these that is... would it make it easier for you? You're currently the only one actively around who can, it seems :D

snajpa avatar Apr 30 '25 14:04 snajpa

The actual problematic part is doing insert_inode_locked and then calling remove_inode_hash, that's what has motivated me to try and do something - but this something doesn't seem right either! This patch in its original form is why we were seeing inodes not being synced like here - https://github.com/openzfs/zfs/pull/16817

The problem was that one can't do acl init before insert_inode_locked...

My original approach in this patch was to try to not doing the insert at all in case there's an error - because remove_inode_hash after insert_inode_locked is problematic - the right solution seems to be to run drop_nlink and then iput the reference on newly mknode'd inode, in case the progress needs to be reverted.

So I'll remodel this, insert_inode_locked doesn't need to be ripped out of mknode, it can stay where it was - just the error paths of zfs_mknode call sites can't do remove_inode_hash but should do drop_nlink.

snajpa avatar Jun 11 '25 11:06 snajpa

That looks like it should work out well. It'll also be nice to entirely remove remove_inode_hash() from the code base and leave that to evict().

behlendorf avatar Jun 11 '25 17:06 behlendorf

Meh its a shame those failed builds dont upload artifacts, was kinda relying on that when I wake up they'll be there :D

snajpa avatar Jun 15 '25 10:06 snajpa

OK... apparently if I leave insert_inode_locked in zfs_znode_alloc, I run into this BUG() in iput() in zfs_create and other paths when they iput/zrele...

For reference, this is the version of this patch we're running on all our machines now on top of 6.12.33; it has insert_inode_locked still in ZPL (and also has d_instantiate_new, imo serves as a validation that it works on top of 6.12 at least :D)

I've broken up that patch in multiple commits here and tried to retain the insert in alloc, but I think the best move is to do the insert in topmost functions possible.

snajpa avatar Jun 15 '25 21:06 snajpa

I'll fold the RFC commit in the inode unlock one and redo the commit message if this approach is ok (also forgot to modify zfs_zget so I'll push that now)

snajpa avatar Jun 15 '25 22:06 snajpa

Umm will have to take a proper look at zfs_rezget it seems, may even be that in this particular instance, remove_inode_hash is legit, have to check what other fs do

snajpa avatar Jun 16 '25 00:06 snajpa

Nah, nope, I'm done. Migrating away from OpenZFS.

snajpa avatar Aug 19 '25 09:08 snajpa