zfs Data corruption after TRIM

System information

Type	Version/Name
Distribution Name	Void Linux
Distribution Version	musl, up to date (rolling)
Kernel Version	6.1.8_1
Architecture	x86_64
OpenZFS Version	2.1.7-1

Describe the problem you're observing

Data corruption in some files and lot of checksum errors.

Describe how to reproduce the problem

I ran zpool scrub on a simple zpool, consisting of a single partition on a 120 GiB SSD. It finished without any errors (neither data or checksum). The I ran zpool trim on the pool. Then a few minutes later I saw 2 checksum errors by the report. Then a few more. (I'm not sure if the first error appeared before or after the TRIM.) I ran zfs trim again, because the first time it finished quite fast despite there were much space to trim. Few more errors. (Probably not related to running TRIM again, because the errors has been constantly increasing since then.) Then I was thinking maybe the first scrub before the trim somehow did not notice the errors, so I ran zfs scrub again. The errors grown to 100-300 in the first few seconds, so I stopped it. At this point I couldn't use zfs send to save snapshot, so I used rsync to backup the data. It reported I/O error for a few files.

It's an SSD, which report 512 blocks size, so I used ashift=9. Later I learned that most SSDs actually use larger blocks, but report 512. I don't know if it affected the trim.

The sda1 BTRFS partition contains ~2 GiB data (kernels, initramfs, and boot related stuff). I run a TRIM and a btrfs scrub on it. No errors were found.

A SMART extended self test was running when I first ran the zfs scrub (and possibly when I run zfs trim the first time, but it could have finished then).

Native encryption and compression is active.

Partition scheme:

NAME   FSTYPE     FSVER LABEL
sda
├─sda1 btrfs            VoidBoot
├─sda2 zfs_member 5000  rpool
└─sda3 swap       1     VoidSwap

Output of zpool status before submitting issue:

  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub canceled on Sun Feb 19 16:29:42 2023
config:

	NAME        STATE     READ WRITE CKSUM
	rpool       DEGRADED     0     0     0
	  sda2      DEGRADED     0     0 4.28K  too many errors

errors: 481 data errors, use '-v' for a list

Include any warning/errors/backtraces from the system logs

No errors in dmesg.

Feb 19 '23 17:02 notramo

n.b. zpool trim isn't going to block until done unless you use -w, it's going to do it in the background, and a lot of drives IME will just internally queue up the TRIM commands and return done to them almost instantly while actually releasing the blocks in the background, which on some (NVMe) drives, you can see with the "space allocated" counter, You can see how far ZFS has progressed on it from its perspective with zpool status -t.

It'd be useful to know the model and make of the SSD, and firmware version, as some SSDs have had...questionable TRIM implementations. (Not saying ZFS can't have a bug here, just useful as a data point if it turns out that it only breaks on WD SSDs or Samsung or something.)

e: It'd also be interesting if you have any of the zpool events -v output for any of the checksum errors, because that can include how much the checksum differed from expected.

It could, of course, also be native encryption being on fire, because why not.

Feb 19 '23 18:02 rincebrain

I have used it with this file system for more than a year, and never got an error. It went without trimming, because I only noticed today that autotrim was disabled. The first scrub (before trim) however returned 0 errors, and they only started appearing after zfs trim. However, there is zero read or write errors, only a lot of checksum errors.

The model of the drive is WDC WDS120G2G0A-00JH30.

Worth noting that there is a swap partition on the drive (sda3), which has been extensively used (swappiness 60), but I never got a kernel panic or any other memory error which would indicate that the drive fails under the swap partition. Neither got an error on the BTRFS boot partition.

Feb 19 '23 19:02 notramo

Of course, I'm not trying to suggest you did something wrong or that it's necessarily anything else at fault, just trying to figure out what makes your setup different from many other people who haven't been bitten by this.

This suggests TRIM support on that drive might be in The Bad Place(tm) sometimes, and I don't see an existing erratum in Linux's libata for it? Unclear why btrfs might not be upset about that, though, maybe not using enough queued IO, or not doing writes at the same time?

You could test this, I think, by forcibly telling Linux to apply that erratum to that drive. Something like libata.force=noncqtrim to your kernel parameters would disable NCQ while doing TRIM for all drives in the machine, and then there's also syntax for being more granular.

Feb 19 '23 21:02 rincebrain

I've been hit by this issue as well.

arch: x86_64
os: ubuntu 20.04 (kernel 5.6)
zfs: 2.2.2
libata.force=noncqtrim: applied to kernel options
drives: SanDisk SSD Plus 2TB (fw UP4504RL)
vdev topology: 2-way mirror

I also tried setting queue depth to 1 without improvement. I can repro it everytime by trimming one device, then scrubbing. That device will show 10 to 30 checksum errors, which are fixed by the scrub.

Dec 09 '23 14:12 djkazic

You're absolutely sure you're actually running 2.2.2?

Because #15395 would be the obvious reason to suspect something going foul here, and that's in 2.2.0+.

Dec 09 '23 15:12 rincebrain

Yes, I'm sure.

> $ zfs --version
zfs-2.2.2-1
zfs-kmod-2.2.2-1

> $ zpool --version
zfs-2.2.2-1
zfs-kmod-2.2.2-1

Dec 09 '23 16:12 djkazic

#15588 becomes my default guess for a thing to try, then, though I'd also test that you can't reproduce this with a non-ZFS configuration to be sure it's not just the drive doing something ridiculous with TRIM.

Dec 09 '23 16:12 rincebrain

Hmm, okay. I'll wait for the next minor release that will include it, this is a production system so I can't test it with non-ZFS filesystems at the moment, though I could buy a replacement drive from a different manufacturer and replace one of the two drives for the mirror vdev.

Is there any unofficial list on the best SSDs to use with ZFS? I know Samsung SATA drives have NCQ TRIM issues, should I get a Micron one? I'm a bit lost.

Dec 09 '23 16:12 djkazic

I wouldn't expect that fix to go into a minor release, though I could be wrong.

I'm not really aware of any SSDs that should have visible issues with ZFS where the underlying issue is an interaction with the SSD. #14793 has a number of people getting upset about certain models of SSD, but a number of them also reported getting a better PSU made the issues disappear, so who knows.

There's also #15588 complicating matters, but that seems more controller and memory pressure-specific and not drive.

Dec 09 '23 16:12 rincebrain

I highly doubt #15588 will have anything to do with it - it doesn't change TRIM.

Dec 09 '23 21:12 robn

I suspect this may be down to this model of drive's firmware so I've ordered two commonly used drives to replace them. Will see if this can fix the issue in the next few days.

Dec 09 '23 21:12 djkazic

Update: after switching to 870 Evos the issue is no longer reproducible. No corruption or checksum errors on TRIM.

Dec 11 '23 16:12 djkazic

I've got the same error on nearly the same SSDs. In my case it's a Western Digital WDS240G2G0A-00JH30 and a SanDisk SDSSDA-240G (they are both the same drive, one is just rebranded).

When debugging I tested this error with a lot of different drives and it only seems to affect these two one. One notable difference of these two drives to all the other ones I've tested is that only these drives use DRAT TRIM (deterministic return after trim). All the other ones I have tested either claim to have non deterministic trim or RZAT TRIM (return zero after trim).

I'm wondering if maybe ZFS has a problem with DRAT TRIM ssds or if it's just the SanDisk / WD ssds behaving badly.

I also verified through testing that it's not the PSU / Sata Controller / Sata cables / RAM / CPU causing these issues. I've tried three different machines with different hardware and the error always only manifests with those two SSDs. The S.M.A.R.T values are all ok and the error also only occurs after deleting some files and then running a trim and then a scrub. Without trim, no error occurs at all.

System info:

Info	Value
ZFS Ver	2.1.11-1
OS	Debian 12
Kernel	6.1.0-16-amd64

Jan 20 '24 03:01 LTek-online

cf. #15395 in 2.1.14.

Jan 20 '24 07:01 rincebrain

I am now using the TRIM without any issues, on the same drive as I opened the issue originally about. However, there are some differences since then:

I reformatted the pool, with ashift=12. Previously it was probably ashift=9. The SSD might have TRIM issues with smaller ashift.
The SSD is in a different, newer laptop.

I don't know how these could affect the bugs, but I thought I'll throw in this info here in case someone finds it useful.

Jan 20 '24 19:01 notramo

I did now retest it ashift=12 like @notramo and also had success. In my case there was no hardware change, so only the ashift=12 value changed. No checksum errors after running trim. Also, as I doubted the trim capabilities of the drives I retested it according to this doc https://wiki.ubuntuusers.de/SSD/TRIM/Testen/ and found that the drive actually uses RZAT trim.

Jan 21 '24 04:01 LTek-online

If you've updated to 2.1.14 or newer and are no longer seeing this it was probably caused by https://github.com/openzfs/zfs/pull/15395 which has been resolved.

Jan 23 '24 00:01 behlendorf

I retested it on 2.2.2-4 in Debian 13 and the issue is still present.

Jan 23 '24 23:01 LTek-online

That's unfortunate, version 2.2.2 does include the fix for https://github.com/openzfs/zfs/pull/15395.

Jan 25 '24 16:01 behlendorf

It's unlikely that #15395 caused it because I did not put the system under reasonable write workload while trimming.

Feb 02 '24 17:02 notramo

zfs zfs copied to clipboard

Data corruption after TRIM

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

zfs
zfs copied to clipboard