zfs icon indicating copy to clipboard operation
zfs copied to clipboard

O_DIRECT regression on TIP post 2.2.6 release

Open Qubitium opened this issue 1 year ago • 10 comments

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 24.04
Kernel Version 6.8/6.10.9/6.10.12
Architecture Linux x86_64
OpenZFS Version d34d4f97a81f6895de3da67ffbad6f986b2cdae6

Describe the problem you're observing

percona-mysql, mariadb using tokudb/innodb with O_DIRECT (innodb) enabled or DIRECTIO (toku) enabled will crash due to tokudb (22) init error and innodb will crash about writing to ibtmp1 and expecting n byte written but got 0 bytes.

Describe how to reproduce the problem

Install latest tip and enable O_DIRECT for innodb and/or DIRECTIO for tokudb.

ZFS 2.2.6 does not have this issue. The error is caused by regression between changes made after 2.2.6 and <= commit: d34d4f97a81f6895de3da67ffbad6f986b2cdae6.

At first I thought it was the 6.10.9/6.10.12 kernel I was using. It is not kernel related. I rolled back to Ubuntu 24.04 6.8 kernel and issue persisted.

Only when I disabled O_DIRECT options for innodb/toku did the errors go away with latest compiled zfs from commit d34d4f97a81f6895de3da67ffbad6f986b2cdae6. The errors also went away with O_DIRECT enabled with stable 2.2.6 release.

This appears to be a really bad regression. Please double check the Linux O_DIRECT unit tests if any.

Extra notes on my zfs pool:

  • stripped (no raidz protection)
  • nvme
  • encryption enabled
  • blocksize=16KB
  • no compression
  • no dedup

EDIT: looks to be directly related to #16594 Which also details other directio bugs on tip.

Qubitium avatar Oct 02 '24 20:10 Qubitium

That one, the person thinks should have been fixed by a specific commit already in git tip.

Also, keep in mind, 2.2.6 is not on the same series of commits as HEAD of master, 2.2 branched from master a while ago and gets cherrypicks, basically, so that's a lot larger delta than just "since 2.2.6".

That said, I agree it's probably the O_DIRECT changes if it's anything, but if you wanted to bisect, just be aware that's not a line you can bisect on effectively.

rincebrain avatar Oct 02 '24 20:10 rincebrain

@Qubitium thanks for testing out O_DIRECT and reporting this. There are some known issues we're working to wrap up. We'll be disabling this in the upcoming 2.3 release branch until fully resolved, https://github.com/openzfs/zfs/pull/16597. The intent is to leave it enabled in the master branch to continue to surface any potential issues.

behlendorf avatar Oct 02 '24 21:10 behlendorf

@behlendorf Thanks. I will apply the zfs level direct io disable right-away and stay away from using tip. I did not even realize tip is not 2.2.next but a larger dev branch.

@rincebrain

Also, keep in mind, 2.2.6 is not on the same series of commits as HEAD of master, 2.2 branched from master a while ago and gets cherrypicks, basically, so that's a lot larger delta than just "since 2.2.6".

This is seriously confusing to the end-user (non-contributor). The version tagged in the master is 2.2.99 so to anyone compiling and installing, it is telling the user that is a direct lineage of 2.2.next branch.

Qubitium avatar Oct 03 '24 02:10 Qubitium

@behlendorf @rincebrain I have isolated the regression range to be > 1713aa7b4 and <= d34d4f97a81f6895de3da67ffbad6f986b2cdae6

Qubitium avatar Oct 03 '24 02:10 Qubitium

I don't think the expectation is for end users to run git tip, generally.

That said, I don't know that there's any option that makes more semantic sense - it's more or less [current].99-r[number of commits since branch]-g[tag] or [next].0pre[something][same suffix], if people want something that also keeps some kind of versioning ordering for packages.

So I can't think of any way to get a nice stable ordering that's not "infinitely after [current] but smaller than [next].0" other than something like what's used now.

rincebrain avatar Oct 03 '24 02:10 rincebrain

That said, I don't know that there's any option that makes more semantic sense - it's more or less [current].99-r[number of commits since branch]-g[tag] or [next].0pre[something][same suffix], if people want something that also keeps some kind of versioning ordering for packages.

I don't know the reasons but having master as non-release tree is bound to confuse a lot of people. Most people expect master to be the release/dev branch where releases are tagged and non-release dev code are in their branches. Not the other way where master becomes a long 2.3 major release and in-between are sprinkled 2.2.x branch offshoots cuts/cherrypicks. If anything, 2.3 should be on 2.3 branch, and the cherry picks for interim 2.2.x to release are from 2.3 to master + tags.

Qubitium avatar Oct 03 '24 02:10 Qubitium

This is how e.g. Linux and FreeBSD work, 6.0.1 isn't on the main branch, it's patches on 6.0, and FreeBSD's release process works similarly.

rincebrain avatar Oct 03 '24 03:10 rincebrain

Most people expect master to be the release/dev branch where releases are tagged and non-release dev code are in their branches.

This is basically the model being used. All development work is done on branches and only merged in to master once its complete. For all of the supported releases (2.0.x, 2.1.x, 2.2.x) there is a release branch which get backports of needed changes from the master branch.

https://github.com/openzfs/zfs/tree/zfs-2.0-release https://github.com/openzfs/zfs/tree/zfs-2.1-release https://github.com/openzfs/zfs/tree/zfs-2.2-release

One thing we could potentially do is make the default branch the latest release branch. That would perhaps be a bit more annoying for the developers, but it would ensure anyone casually cloning the git tree would get the latest release version,

behlendorf avatar Oct 03 '24 03:10 behlendorf

or call master version 2.99

clhedrick avatar Oct 03 '24 17:10 clhedrick

or call master version 2.99

@clhedrick No. Then the version number would be bigger than later released 2.3, which is not right. Good side of 2.2.99 is that it is bigger than any 2.2.x, but smaller than 2.3.

amotin avatar Oct 03 '24 18:10 amotin