Spacemap Problem with Data Error
Scrub seems to have found an error in last night's snapshot:
zpool status -v rpool
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 1h18m with 1 errors on Wed Feb 11 11:18:42 2015
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
root_crypt ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
rpool/steam@20150210-1528:<0x0>
ZDB also says:
MOS Configuration:
version: 5000
name: 'rpool'
state: 0
txg: 17002896
pool_guid: 8092466394753084738
errata: 0
hostid: 1714892848
hostname: 'unknown'
vdev_children: 1
vdev_tree:
type: 'root'
id: 0
guid: 8092466394753084738
children[0]:
type: 'disk'
id: 0
guid: 15923963185428311133
path: '/dev/mapper/root_crypt'
whole_disk: 0
metaslab_array: 34
metaslab_shift: 33
ashift: 12
asize: 999403552768
is_log: 0
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
space map refcount mismatch: expected 125 != actual 0
... that last "space map refcount mismatch" piece is a bit concerning. Does this have anything to do with recent patches to spacemap accounting, and would those changes be the problem, or are they detecting an existing issue from bad prior logic?
@sempervictus if a spacemap were somehow wrong it could result in permanent damage to a file. For example, if the spacemap indicated a block was free when it wasn't then a new write could overwrite existing data in a snapshot. For a pool configured without redundancy that damage wouldn't be correctable and the scrub would detect it like your seeing.
It would be useful to get a closer look at the spacemaps, could you run zdb -m and zdb -M on the pool. The zdb error your seeing is saying that it expects there to be 125 metaslabs but none were actually found which is hard to believe.
Just noticed #3108 appears to have the same problem. Did we introduce a regression which is causing corruption in active datasets?
Outputs for zdb commands on the dataset:
zdb -m rpool/steam
Dataset rpool/steam [ZPL], ID 17529, cr_txg 14065881, 98.6G, 78176 objects
zdb -M rpool/steam
Dataset rpool/steam [ZPL], ID 17529, cr_txg 14065881, 98.6G, 78176 objects
and on the pool:
zdb -m rpool
Metaslabs:
vdev 0
metaslabs 116 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 0 free 8G
metaslab 1 offset 200000000 spacemap 0 free 8G
metaslab 2 offset 400000000 spacemap 0 free 8G
metaslab 3 offset 600000000 spacemap 0 free 8G
metaslab 4 offset 800000000 spacemap 0 free 8G
metaslab 5 offset a00000000 spacemap 0 free 8G
metaslab 6 offset c00000000 spacemap 0 free 8G
metaslab 7 offset e00000000 spacemap 0 free 8G
metaslab 8 offset 1000000000 spacemap 0 free 8G
metaslab 9 offset 1200000000 spacemap 0 free 8G
metaslab 10 offset 1400000000 spacemap 0 free 8G
metaslab 11 offset 1600000000 spacemap 0 free 8G
metaslab 12 offset 1800000000 spacemap 0 free 8G
metaslab 13 offset 1a00000000 spacemap 0 free 8G
metaslab 14 offset 1c00000000 spacemap 0 free 8G
metaslab 15 offset 1e00000000 spacemap 0 free 8G
metaslab 16 offset 2000000000 spacemap 0 free 8G
metaslab 17 offset 2200000000 spacemap 0 free 8G
metaslab 18 offset 2400000000 spacemap 0 free 8G
metaslab 19 offset 2600000000 spacemap 0 free 8G
metaslab 20 offset 2800000000 spacemap 0 free 8G
metaslab 21 offset 2a00000000 spacemap 0 free 8G
metaslab 22 offset 2c00000000 spacemap 0 free 8G
metaslab 23 offset 2e00000000 spacemap 0 free 8G
metaslab 24 offset 3000000000 spacemap 0 free 8G
metaslab 25 offset 3200000000 spacemap 0 free 8G
metaslab 26 offset 3400000000 spacemap 0 free 8G
metaslab 27 offset 3600000000 spacemap 0 free 8G
metaslab 28 offset 3800000000 spacemap 0 free 8G
metaslab 29 offset 3a00000000 spacemap 0 free 8G
metaslab 30 offset 3c00000000 spacemap 0 free 8G
metaslab 31 offset 3e00000000 spacemap 0 free 8G
metaslab 32 offset 4000000000 spacemap 0 free 8G
metaslab 33 offset 4200000000 spacemap 0 free 8G
metaslab 34 offset 4400000000 spacemap 0 free 8G
metaslab 35 offset 4600000000 spacemap 0 free 8G
metaslab 36 offset 4800000000 spacemap 0 free 8G
metaslab 37 offset 4a00000000 spacemap 0 free 8G
metaslab 38 offset 4c00000000 spacemap 0 free 8G
metaslab 39 offset 4e00000000 spacemap 0 free 8G
metaslab 40 offset 5000000000 spacemap 0 free 8G
metaslab 41 offset 5200000000 spacemap 0 free 8G
metaslab 42 offset 5400000000 spacemap 0 free 8G
metaslab 43 offset 5600000000 spacemap 0 free 8G
metaslab 44 offset 5800000000 spacemap 0 free 8G
metaslab 45 offset 5a00000000 spacemap 0 free 8G
metaslab 46 offset 5c00000000 spacemap 0 free 8G
metaslab 47 offset 5e00000000 spacemap 0 free 8G
metaslab 48 offset 6000000000 spacemap 0 free 8G
metaslab 49 offset 6200000000 spacemap 0 free 8G
metaslab 50 offset 6400000000 spacemap 0 free 8G
metaslab 51 offset 6600000000 spacemap 0 free 8G
metaslab 52 offset 6800000000 spacemap 0 free 8G
metaslab 53 offset 6a00000000 spacemap 0 free 8G
metaslab 54 offset 6c00000000 spacemap 0 free 8G
metaslab 55 offset 6e00000000 spacemap 0 free 8G
metaslab 56 offset 7000000000 spacemap 0 free 8G
metaslab 57 offset 7200000000 spacemap 0 free 8G
metaslab 58 offset 7400000000 spacemap 0 free 8G
metaslab 59 offset 7600000000 spacemap 0 free 8G
metaslab 60 offset 7800000000 spacemap 0 free 8G
metaslab 61 offset 7a00000000 spacemap 0 free 8G
metaslab 62 offset 7c00000000 spacemap 0 free 8G
metaslab 63 offset 7e00000000 spacemap 0 free 8G
metaslab 64 offset 8000000000 spacemap 0 free 8G
metaslab 65 offset 8200000000 spacemap 0 free 8G
metaslab 66 offset 8400000000 spacemap 0 free 8G
metaslab 67 offset 8600000000 spacemap 0 free 8G
metaslab 68 offset 8800000000 spacemap 0 free 8G
metaslab 69 offset 8a00000000 spacemap 0 free 8G
metaslab 70 offset 8c00000000 spacemap 0 free 8G
metaslab 71 offset 8e00000000 spacemap 0 free 8G
metaslab 72 offset 9000000000 spacemap 0 free 8G
metaslab 73 offset 9200000000 spacemap 0 free 8G
metaslab 74 offset 9400000000 spacemap 0 free 8G
metaslab 75 offset 9600000000 spacemap 0 free 8G
metaslab 76 offset 9800000000 spacemap 0 free 8G
metaslab 77 offset 9a00000000 spacemap 0 free 8G
metaslab 78 offset 9c00000000 spacemap 0 free 8G
metaslab 79 offset 9e00000000 spacemap 0 free 8G
metaslab 80 offset a000000000 spacemap 0 free 8G
metaslab 81 offset a200000000 spacemap 0 free 8G
metaslab 82 offset a400000000 spacemap 0 free 8G
metaslab 83 offset a600000000 spacemap 0 free 8G
metaslab 84 offset a800000000 spacemap 0 free 8G
metaslab 85 offset aa00000000 spacemap 0 free 8G
metaslab 86 offset ac00000000 spacemap 0 free 8G
metaslab 87 offset ae00000000 spacemap 0 free 8G
metaslab 88 offset b000000000 spacemap 0 free 8G
metaslab 89 offset b200000000 spacemap 0 free 8G
metaslab 90 offset b400000000 spacemap 0 free 8G
metaslab 91 offset b600000000 spacemap 0 free 8G
metaslab 92 offset b800000000 spacemap 0 free 8G
metaslab 93 offset ba00000000 spacemap 0 free 8G
metaslab 94 offset bc00000000 spacemap 0 free 8G
metaslab 95 offset be00000000 spacemap 0 free 8G
metaslab 96 offset c000000000 spacemap 0 free 8G
metaslab 97 offset c200000000 spacemap 0 free 8G
metaslab 98 offset c400000000 spacemap 0 free 8G
metaslab 99 offset c600000000 spacemap 0 free 8G
metaslab 100 offset c800000000 spacemap 0 free 8G
metaslab 101 offset ca00000000 spacemap 0 free 8G
metaslab 102 offset cc00000000 spacemap 0 free 8G
metaslab 103 offset ce00000000 spacemap 0 free 8G
metaslab 104 offset d000000000 spacemap 0 free 8G
metaslab 105 offset d200000000 spacemap 0 free 8G
metaslab 106 offset d400000000 spacemap 0 free 8G
metaslab 107 offset d600000000 spacemap 0 free 8G
metaslab 108 offset d800000000 spacemap 0 free 8G
metaslab 109 offset da00000000 spacemap 0 free 8G
metaslab 110 offset dc00000000 spacemap 0 free 8G
metaslab 111 offset de00000000 spacemap 0 free 8G
metaslab 112 offset e000000000 spacemap 0 free 8G
metaslab 113 offset e200000000 spacemap 0 free 8G
metaslab 114 offset e400000000 spacemap 0 free 8G
metaslab 115 offset e600000000 spacemap 0 free 8G
space map refcount mismatch: expected 125 != actual 0
zdb -M rpool
vdev 0 metaslabs 116 fragmentation 0%
pool rpool fragmentation 0%
space map refcount mismatch: expected 125 != actual 0
The pool however reports 41% frag, even though its on a single (dm-crypt) vdev:
zpool list -v rpool
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 928G 631G 297G - 41% 67% 1.00x ONLINE -
root_crypt 928G 631G 297G - 41% 67%
How does ZFS decide how many metaslabs to expect? Looks like i've got 116, not 0, or 125 here. Secondly, am i seeing a pool-level problem, or a dataset-level problem? I can wipe the DS with no issue - nothing important there, but the pool has other datasets which are rather vital (backed up to multi-vdev pools, not showing errors in those).
Scrub now reports an error in both the snapshot and the live DS, but older snapshots appear unaffected. I can try a rollback, but am concerned that if a snap was made with a working version of ZFS, i would start writing bad data to it with a potentially damaged ZFS runtime.
ugh :-1:
zdb -M WD30EFRX vdev 0 metaslabs 170 fragmentation 0% pool WD30EFRX fragmentation 0% space map refcount mismatch: expected 167 != actual 0
just migrated to 3.19 and latest master
hopefully this isn't something to worry about ? :8ball:
currently a scrub is running - tomorrow (CET) I'll know more, I'm not using any snapshots or other fancy features beside mirroring right now; having 2 swap zvols on other partitions on the 2 mirror harddrives
that's all - I actually wanted to start doing some more intense usage and testing during the next days
luckily I did a grand backup (several backup volumes) before since data is vital [academic writing/dissertation, that's why I started to use ZFS in the first place] and what I noticed was that a subfolder had a changed timestamp (from today) which I - if I remember correctly - didn't touch at all
@sempervictus what hardware is this pool running on top ? dm-crypt/luks ? what kernel version ?
I should have stayed at 3.17.7 + state from december/november - anyway: fingers crossed
edit :
currently used ZFS state is https://github.com/kernelOfTruth/zfs/commits/zfs_kOT_14.02.2015 https://github.com/kernelOfTruth/spl/commits/spl_kOT_14.02.2015
so pretty recent
state used before was: https://github.com/kernelOfTruth/zfs/commits/zfs_kOT_13.01.2015 https://github.com/kernelOfTruth/spl/commits/spl_kOT_13.01.2015
to be honest I didn't use the zdb command before so I don't know if the error message from zdb was there before
zpool scrub did not show any errors so far and also all files accessed or used where working properly
related:
https://bugs.freenas.org/issues/6352 https://forums.freenas.org/index.php?threads/problems-with-freenas-9-2-1-5.20879/
so it's not limited to ZFSonLinux, but also occurs on FreeBSD \ m0n0wall \ NanoBSD \ FreeNAS - based systems
This is going from bad to worse. Attempting to copy data out of the dataset into another pool resulted in IO errors on read. This hard locked the host. I just had the backup host lockup on scrub (the one with multiple VDEVs hosting a pool receiving my bad snaps). That's not even the "worse" part, that's just the bad. The worse part is that "zfs rollback -r rpool/steam@20150209-0445" resulted in the following output from zpool status -v rpool:
errors: Permanent errors have been detected in the following files:
<0x134a>:<0x0>
rpool/steam:<0x0>
<0x12ca>:<0x0>
<0xffffffffffffffff>:<0x0>
I'm getting the sense that i either wipe the dataset or need to start copying out data from the datasets which are alive (hopefully not causing IO errors) and attempting to restore the pool from days ago on the backup host, presuming i didnt damage that one beyond repair by sending the snaps to it.
@kernelOfTruth: the second freenas bug suggests this to be a ZoL issue, the first does not tell us if ZoL has touched the pool. For all we know we might be borking everyone's data on disk and causing other distributions to suffer the spacemap problems we introduce. This setup is on 3.17.6 since we cant get Ubuntu 14 to boot a dm-crypt volume with our later kernels for our lives (never finds the volume to decrypt and never presents the askpass prompt, initrd seems unable to find the physical disk). The rpool is on an encrypted block device, and we all know ZFS prefers real disks, but since Oracle's legal department is too busy building their palaces in the depths of Tartarus to respond to inquiries about their labling the crypto code as CDDL, we have to encrypt our data somehow.
I ran a scrub, which changed the output of zpool status -v to something even less pleassant:
zpool status -v rpool
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 0h53m with 0 errors on Sun Feb 15 19:18:23 2015
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 1.32K
root_crypt ONLINE 0 0 5.26K
errors: Permanent errors have been detected in the following files:
<0xffffffffffffffff>:<0x0>
Apparently rolling back the dataset was not the way to go here, since ZFS can't delete its own corrupt data, and said corruption just moves into the pool from the dataset. How do i know that ZFS can't delete the corrupt data? The freeing counter on rpool is stuck @ 3.47MB (about the size of the deltas rolled back), and i can't really imagine it being so computationally intensive to delete nodes from a tree as to take an hour to free 3MB worth of nodes.
Given that there's critical data in the pool, and the error has propagated into tier-1 of the backups (meaning to recover the pool from tier-2 i lose two weeks of data), i'm a bit more concerned about saving the pool than finding the bug. Is there a manual way to drop the bad records after the last known good snapshot? Or manually edit the vdev pruning the garbage records into "empty space?"
Here's the part which should probably put the brakes on 0.6.4 until we know what is happening here: After testing my own pools, i discovered that we have multiple pools in our datacenter which are NOT showing any scrub errors, reporting mismatches in the space map using zdb -m. These are hosts with different builds and patch stacks, but all of them post kmem rework - the thinnest ones only run 1099 and the pool GUID patch. Upstream packages for CentOS 6 do NOT contain whatever is causing this to occur, those pools are clean.
The damaged pools are pushing off to a FreeNAS (8.1 on x86) as the final tier of their backup, but it appears that this problem propagates right through snapshots and across platforms once introduced according to posts @kernelOfTruth linked. This freenas instance here has no spacemap histogram feature, its literally been running in our DC for several years untouched on FreeNAS 8.1 in a deep pit of network isolation, and ZFS is old (given that it suffers running on x86 with 2G of RAM). Despite the source pools reporting the mismatch - the FreeNAS host does NOT. So this problem may be cleaned up, by "filtering" it through an older ZFS send/recv cycle, presuming you're not in my position yet where you have something looking like dangling pointers in the pool itself.
@everyone: to test if you're affected, the following will output every pool name, and a line underneath either indicating you have a mismatch, or blank if you're fine:
zpool list |grep -v NAME | cut -d' ' -f1 | while read POOL; do echo $POOL; echo $(zdb -m $POOL | grep mismatch); done
Finally some good news, or lack of it getting worse anyway. After the last scrub i rebooted the host in the hopes that an import will actually free the "freeing" amount. No dice there - still have 3.47M showing as freeing. However, the pool is importable and i appear to have access to my data. ZDB has taken a slightly better view of rpool, still shows a mismatch, but at least not it's not claiming there are no spacemaps:
zdb -m rpool
Metaslabs:
vdev 0
metaslabs 116 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 24805 free 1.44G
metaslab 1 offset 200000000 spacemap 79 free 2.04G
metaslab 2 offset 400000000 spacemap 78 free 338M
metaslab 3 offset 600000000 spacemap 77 free 4.02G
metaslab 4 offset 800000000 spacemap 24807 free 4.34G
metaslab 5 offset a00000000 spacemap 24834 free 2.27G
metaslab 6 offset c00000000 spacemap 24883 free 1.36G
metaslab 7 offset e00000000 spacemap 24818 free 884M
metaslab 8 offset 1000000000 spacemap 24819 free 3.11G
metaslab 9 offset 1200000000 spacemap 58 free 249M
metaslab 10 offset 1400000000 spacemap 24865 free 3.28G
metaslab 11 offset 1600000000 spacemap 24832 free 2.69G
metaslab 12 offset 1800000000 spacemap 24828 free 3.92G
metaslab 13 offset 1a00000000 spacemap 24881 free 4.81G
metaslab 14 offset 1c00000000 spacemap 24861 free 1.44G
metaslab 15 offset 1e00000000 spacemap 24851 free 437M
metaslab 16 offset 2000000000 spacemap 24894 free 2.48G
metaslab 17 offset 2200000000 spacemap 45 free 1.49G
metaslab 18 offset 2400000000 spacemap 24891 free 4.51G
metaslab 19 offset 2600000000 spacemap 24892 free 686M
metaslab 20 offset 2800000000 spacemap 24871 free 1.33G
metaslab 21 offset 2a00000000 spacemap 24843 free 1.67G
metaslab 22 offset 2c00000000 spacemap 24797 free 3.75G
metaslab 23 offset 2e00000000 spacemap 24893 free 2.94G
metaslab 24 offset 3000000000 spacemap 24833 free 2.74G
metaslab 25 offset 3200000000 spacemap 24884 free 2.06G
metaslab 26 offset 3400000000 spacemap 24887 free 908M
metaslab 27 offset 3600000000 spacemap 24792 free 3.38G
metaslab 28 offset 3800000000 spacemap 24794 free 4.17G
metaslab 29 offset 3a00000000 spacemap 24810 free 1.07G
metaslab 30 offset 3c00000000 spacemap 24803 free 2.90G
metaslab 31 offset 3e00000000 spacemap 24829 free 578M
metaslab 32 offset 4000000000 spacemap 24856 free 1.45G
metaslab 33 offset 4200000000 spacemap 24813 free 2.16G
metaslab 34 offset 4400000000 spacemap 24839 free 1.06G
metaslab 35 offset 4600000000 spacemap 24800 free 573M
metaslab 36 offset 4800000000 spacemap 24868 free 2.91G
metaslab 37 offset 4a00000000 spacemap 24838 free 4.62G
metaslab 38 offset 4c00000000 spacemap 24882 free 483M
metaslab 39 offset 4e00000000 spacemap 44 free 2.77G
metaslab 40 offset 5000000000 spacemap 24860 free 803M
metaslab 41 offset 5200000000 spacemap 24842 free 1.45G
metaslab 42 offset 5400000000 spacemap 43 free 3.31G
metaslab 43 offset 5600000000 spacemap 24812 free 321M
metaslab 44 offset 5800000000 spacemap 24796 free 5.79G
metaslab 45 offset 5a00000000 spacemap 24809 free 1.20G
metaslab 46 offset 5c00000000 spacemap 24880 free 1.18G
metaslab 47 offset 5e00000000 spacemap 24848 free 863M
metaslab 48 offset 6000000000 spacemap 24804 free 668M
metaslab 49 offset 6200000000 spacemap 24816 free 536M
metaslab 50 offset 6400000000 spacemap 24821 free 1.84G
metaslab 51 offset 6600000000 spacemap 24845 free 747M
metaslab 52 offset 6800000000 spacemap 24811 free 125M
metaslab 53 offset 6a00000000 spacemap 24837 free 5.50G
metaslab 54 offset 6c00000000 spacemap 24870 free 4.21G
metaslab 55 offset 6e00000000 spacemap 24852 free 52.9M
metaslab 56 offset 7000000000 spacemap 24855 free 1.19G
metaslab 57 offset 7200000000 spacemap 24872 free 1006M
metaslab 58 offset 7400000000 spacemap 24841 free 505M
metaslab 59 offset 7600000000 spacemap 40 free 63.8M
metaslab 60 offset 7800000000 spacemap 38 free 2.12G
metaslab 61 offset 7a00000000 spacemap 24840 free 513M
metaslab 62 offset 7c00000000 spacemap 24853 free 3.29G
metaslab 63 offset 7e00000000 spacemap 24815 free 1007M
metaslab 64 offset 8000000000 spacemap 24846 free 707M
metaslab 65 offset 8200000000 spacemap 37 free 4.00G
metaslab 66 offset 8400000000 spacemap 24808 free 762M
metaslab 67 offset 8600000000 spacemap 24835 free 321M
metaslab 68 offset 8800000000 spacemap 24857 free 3.97G
metaslab 69 offset 8a00000000 spacemap 24836 free 969M
metaslab 70 offset 8c00000000 spacemap 24878 free 41.7M
metaslab 71 offset 8e00000000 spacemap 24847 free 5.50G
metaslab 72 offset 9000000000 spacemap 24795 free 3.26G
metaslab 73 offset 9200000000 spacemap 24867 free 501M
metaslab 74 offset 9400000000 spacemap 24849 free 1.37G
metaslab 75 offset 9600000000 spacemap 24879 free 1.17G
metaslab 76 offset 9800000000 spacemap 24822 free 646M
metaslab 77 offset 9a00000000 spacemap 24844 free 47.9M
metaslab 78 offset 9c00000000 spacemap 24859 free 247M
metaslab 79 offset 9e00000000 spacemap 24831 free 3.33G
metaslab 80 offset a000000000 spacemap 24854 free 5.13G
metaslab 81 offset a200000000 spacemap 36 free 241M
metaslab 82 offset a400000000 spacemap 24806 free 62.2M
metaslab 83 offset a600000000 spacemap 24799 free 1.71G
metaslab 84 offset a800000000 spacemap 24801 free 2.54G
metaslab 85 offset aa00000000 spacemap 24825 free 2.09G
metaslab 86 offset ac00000000 spacemap 24817 free 832M
metaslab 87 offset ae00000000 spacemap 24826 free 2.22G
metaslab 88 offset b000000000 spacemap 24823 free 1.49G
metaslab 89 offset b200000000 spacemap 24866 free 8.21M
metaslab 90 offset b400000000 spacemap 24873 free 5.07G
metaslab 91 offset b600000000 spacemap 24858 free 5.05G
metaslab 92 offset b800000000 spacemap 24877 free 5.03G
metaslab 93 offset ba00000000 spacemap 24850 free 5.95G
metaslab 94 offset bc00000000 spacemap 24885 free 4.96G
metaslab 95 offset be00000000 spacemap 24886 free 3.79G
metaslab 96 offset c000000000 spacemap 24798 free 4.63G
metaslab 97 offset c200000000 spacemap 24830 free 6.43G
metaslab 98 offset c400000000 spacemap 24820 free 1.50G
metaslab 99 offset c600000000 spacemap 24888 free 3.60G
metaslab 100 offset c800000000 spacemap 35 free 3.08G
metaslab 101 offset ca00000000 spacemap 24874 free 3.35G
metaslab 102 offset cc00000000 spacemap 24875 free 7.26G
metaslab 103 offset ce00000000 spacemap 24876 free 1.38G
metaslab 104 offset d000000000 spacemap 24889 free 6.79G
metaslab 105 offset d200000000 spacemap 24890 free 4.05G
metaslab 106 offset d400000000 spacemap 24869 free 6.82G
metaslab 107 offset d600000000 spacemap 24824 free 7.78G
metaslab 108 offset d800000000 spacemap 24862 free 700M
metaslab 109 offset da00000000 spacemap 24863 free 7.74G
metaslab 110 offset dc00000000 spacemap 24864 free 7.00G
metaslab 111 offset de00000000 spacemap 24814 free 7.65G
metaslab 112 offset e000000000 spacemap 24793 free 7.56G
metaslab 113 offset e200000000 spacemap 24791 free 4.26G
metaslab 114 offset e400000000 spacemap 24827 free 2.34G
metaslab 115 offset e600000000 spacemap 24802 free 4.64G
space map refcount mismatch: expected 127 != actual 117
On the other hand, seems every IO is causing cksum errors, probably because the spacemaps dont align. The pool is sha256, single-disk VDEV, and somehow showing different numbers of cksum errors from the vdev and the pool:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 467
root_crypt ONLINE 0 0 1.82K
Smartctl reports on errors, and i'm not seeing any data-level problems, nor am i seeing these errors on the other pool (also going through a scrub presently).
I tried checksumming the metadata with zdb, and got an assertion failure which may actually help track whats happening:
zdb -mc rpool
Metaslabs:
vdev 0
metaslabs 116 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 24805 free 1.44G
metaslab 1 offset 200000000 spacemap 79 free 2.04G
metaslab 2 offset 400000000 spacemap 78 free 338M
metaslab 3 offset 600000000 spacemap 77 free 4.02G
metaslab 4 offset 800000000 spacemap 24807 free 4.34G
metaslab 5 offset a00000000 spacemap 24834 free 2.27G
metaslab 6 offset c00000000 spacemap 24883 free 1.36G
metaslab 7 offset e00000000 spacemap 24818 free 884M
metaslab 8 offset 1000000000 spacemap 24819 free 3.11G
metaslab 9 offset 1200000000 spacemap 58 free 249M
metaslab 10 offset 1400000000 spacemap 24865 free 3.28G
metaslab 11 offset 1600000000 spacemap 24832 free 2.69G
metaslab 12 offset 1800000000 spacemap 24828 free 3.92G
metaslab 13 offset 1a00000000 spacemap 24881 free 4.81G
metaslab 14 offset 1c00000000 spacemap 24861 free 1.44G
metaslab 15 offset 1e00000000 spacemap 24851 free 437M
metaslab 16 offset 2000000000 spacemap 24894 free 2.48G
metaslab 17 offset 2200000000 spacemap 45 free 1.49G
metaslab 18 offset 2400000000 spacemap 24891 free 4.51G
metaslab 19 offset 2600000000 spacemap 24892 free 686M
metaslab 20 offset 2800000000 spacemap 24871 free 1.33G
metaslab 21 offset 2a00000000 spacemap 24843 free 1.67G
metaslab 22 offset 2c00000000 spacemap 24797 free 3.75G
metaslab 23 offset 2e00000000 spacemap 24893 free 2.94G
metaslab 24 offset 3000000000 spacemap 24833 free 2.74G
metaslab 25 offset 3200000000 spacemap 24884 free 2.06G
metaslab 26 offset 3400000000 spacemap 24887 free 908M
metaslab 27 offset 3600000000 spacemap 24792 free 3.38G
metaslab 28 offset 3800000000 spacemap 24794 free 4.17G
metaslab 29 offset 3a00000000 spacemap 24810 free 1.07G
metaslab 30 offset 3c00000000 spacemap 24803 free 2.90G
metaslab 31 offset 3e00000000 spacemap 24829 free 578M
metaslab 32 offset 4000000000 spacemap 24856 free 1.45G
metaslab 33 offset 4200000000 spacemap 24813 free 2.16G
metaslab 34 offset 4400000000 spacemap 24839 free 1.06G
metaslab 35 offset 4600000000 spacemap 24800 free 573M
metaslab 36 offset 4800000000 spacemap 24868 free 2.91G
metaslab 37 offset 4a00000000 spacemap 24838 free 4.62G
metaslab 38 offset 4c00000000 spacemap 24882 free 483M
metaslab 39 offset 4e00000000 spacemap 44 free 2.77G
metaslab 40 offset 5000000000 spacemap 24860 free 803M
metaslab 41 offset 5200000000 spacemap 24842 free 1.45G
metaslab 42 offset 5400000000 spacemap 43 free 3.31G
metaslab 43 offset 5600000000 spacemap 24812 free 321M
metaslab 44 offset 5800000000 spacemap 24796 free 5.79G
metaslab 45 offset 5a00000000 spacemap 24809 free 1.20G
metaslab 46 offset 5c00000000 spacemap 24880 free 1.18G
metaslab 47 offset 5e00000000 spacemap 24848 free 863M
metaslab 48 offset 6000000000 spacemap 24804 free 668M
metaslab 49 offset 6200000000 spacemap 24816 free 536M
metaslab 50 offset 6400000000 spacemap 24821 free 1.84G
metaslab 51 offset 6600000000 spacemap 24845 free 747M
metaslab 52 offset 6800000000 spacemap 24811 free 125M
metaslab 53 offset 6a00000000 spacemap 24837 free 5.50G
metaslab 54 offset 6c00000000 spacemap 24870 free 4.21G
metaslab 55 offset 6e00000000 spacemap 24852 free 52.9M
metaslab 56 offset 7000000000 spacemap 24855 free 1.19G
metaslab 57 offset 7200000000 spacemap 24872 free 1006M
metaslab 58 offset 7400000000 spacemap 24841 free 505M
metaslab 59 offset 7600000000 spacemap 40 free 63.8M
metaslab 60 offset 7800000000 spacemap 38 free 2.12G
metaslab 61 offset 7a00000000 spacemap 24840 free 513M
metaslab 62 offset 7c00000000 spacemap 24853 free 3.29G
metaslab 63 offset 7e00000000 spacemap 24815 free 1007M
metaslab 64 offset 8000000000 spacemap 24846 free 707M
metaslab 65 offset 8200000000 spacemap 37 free 4.00G
metaslab 66 offset 8400000000 spacemap 24808 free 762M
metaslab 67 offset 8600000000 spacemap 24835 free 321M
metaslab 68 offset 8800000000 spacemap 24857 free 3.97G
metaslab 69 offset 8a00000000 spacemap 24836 free 969M
metaslab 70 offset 8c00000000 spacemap 24878 free 41.7M
metaslab 71 offset 8e00000000 spacemap 24847 free 5.50G
metaslab 72 offset 9000000000 spacemap 24795 free 3.26G
metaslab 73 offset 9200000000 spacemap 24867 free 501M
metaslab 74 offset 9400000000 spacemap 24849 free 1.37G
metaslab 75 offset 9600000000 spacemap 24879 free 1.17G
metaslab 76 offset 9800000000 spacemap 24822 free 646M
metaslab 77 offset 9a00000000 spacemap 24844 free 47.9M
metaslab 78 offset 9c00000000 spacemap 24859 free 247M
metaslab 79 offset 9e00000000 spacemap 24831 free 3.33G
metaslab 80 offset a000000000 spacemap 24854 free 5.13G
metaslab 81 offset a200000000 spacemap 36 free 241M
metaslab 82 offset a400000000 spacemap 24806 free 62.2M
metaslab 83 offset a600000000 spacemap 24799 free 1.71G
metaslab 84 offset a800000000 spacemap 24801 free 2.54G
metaslab 85 offset aa00000000 spacemap 24825 free 2.09G
metaslab 86 offset ac00000000 spacemap 24817 free 832M
metaslab 87 offset ae00000000 spacemap 24826 free 2.22G
metaslab 88 offset b000000000 spacemap 24823 free 1.49G
metaslab 89 offset b200000000 spacemap 24866 free 8.21M
metaslab 90 offset b400000000 spacemap 24873 free 5.07G
metaslab 91 offset b600000000 spacemap 24858 free 5.05G
metaslab 92 offset b800000000 spacemap 24877 free 5.03G
metaslab 93 offset ba00000000 spacemap 24850 free 5.95G
metaslab 94 offset bc00000000 spacemap 24885 free 4.96G
metaslab 95 offset be00000000 spacemap 24886 free 3.79G
metaslab 96 offset c000000000 spacemap 24798 free 4.63G
metaslab 97 offset c200000000 spacemap 24830 free 6.43G
metaslab 98 offset c400000000 spacemap 24820 free 1.50G
metaslab 99 offset c600000000 spacemap 24888 free 3.60G
metaslab 100 offset c800000000 spacemap 35 free 3.08G
metaslab 101 offset ca00000000 spacemap 24874 free 3.35G
metaslab 102 offset cc00000000 spacemap 24875 free 7.26G
metaslab 103 offset ce00000000 spacemap 24876 free 1.38G
metaslab 104 offset d000000000 spacemap 24889 free 6.79G
metaslab 105 offset d200000000 spacemap 24890 free 4.05G
metaslab 106 offset d400000000 spacemap 24869 free 6.82G
metaslab 107 offset d600000000 spacemap 24824 free 7.78G
metaslab 108 offset d800000000 spacemap 24862 free 700M
metaslab 109 offset da00000000 spacemap 24863 free 7.74G
metaslab 110 offset dc00000000 spacemap 24864 free 7.00G
metaslab 111 offset de00000000 spacemap 24814 free 7.65G
metaslab 112 offset e000000000 spacemap 24793 free 7.56G
metaslab 113 offset e200000000 spacemap 24791 free 4.26G
metaslab 114 offset e400000000 spacemap 24827 free 2.34G
metaslab 115 offset e600000000 spacemap 24802 free 4.64G
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 115 of 116 ...
zdb: ../../cmd/zdb/zdb.c:2618: Assertion `0 == bptree_iterate(spa->spa_meta_objset, spa->spa_dsl_pool->dp_bptree_obj, B_FALSE, count_block_cb, &zcb, ((void *)0)) (0x0 == 0x34)' failed
EDIT: that assertion led me to issue 4390 in Illumos, which is merged into our tree, but may be a starting point for those who have a better clue than i as to the cause and fix here. Possibly another argument for an offline fsck implementation - fixing spacemap and metadata issues which online pools seem unable to resolve.
luckily it finished without errors here:
zpool status -v WD30EFRX
pool: WD30EFRX
state: ONLINE
scan: scrub repaired 0 in 6h14m with 0 errors on Mon Feb 16 03:40:04 2015
config:
NAME STATE READ WRITE CKSUM
WD30EFRX ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
wd30efrx_002 ONLINE 0 0 0
wd30efrx ONLINE 0 0 0
cache
intelSSD180 ONLINE 0 0 0
zpool list |grep -v NAME | cut -d' ' -f1 | while read POOL; do echo $POOL; echo $(zdb -m $POOL | grep mismatch); done
SWAP2_1
space map refcount mismatch: expected 3 != actual 0
SWAP2_2
space map refcount mismatch: expected 3 != actual 0
WD30EFRX
space map refcount mismatch: expected 169 != actual 0
note that SWAP2_1 & SWAP2_2 are only several hours old
so this issue might be introduced right from the start or when calling zdb
all volumes are on cryptsetup/luks encrypted partitions or harddrives
I'm running probably a different version of cryptsetup from you, (compared to Ubuntu 14.04)
cryptsetup --version
cryptsetup 1.6.6
so there might be factors involved from userland, too
zdb -m WD30EFRX
Metaslabs:
vdev 0
metaslabs 170 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 0 free 16G
metaslab 1 offset 400000000 spacemap 0 free 16G
metaslab 2 offset 800000000 spacemap 0 free 16G
metaslab 3 offset c00000000 spacemap 0 free 16G
metaslab 4 offset 1000000000 spacemap 0 free 16G
metaslab 5 offset 1400000000 spacemap 0 free 16G
metaslab 6 offset 1800000000 spacemap 0 free 16G
metaslab 7 offset 1c00000000 spacemap 0 free 16G
metaslab 8 offset 2000000000 spacemap 0 free 16G
metaslab 9 offset 2400000000 spacemap 0 free 16G
metaslab 10 offset 2800000000 spacemap 0 free 16G
metaslab 11 offset 2c00000000 spacemap 0 free 16G
metaslab 12 offset 3000000000 spacemap 0 free 16G
metaslab 13 offset 3400000000 spacemap 0 free 16G
metaslab 14 offset 3800000000 spacemap 0 free 16G
metaslab 15 offset 3c00000000 spacemap 0 free 16G
metaslab 16 offset 4000000000 spacemap 0 free 16G
metaslab 17 offset 4400000000 spacemap 0 free 16G
metaslab 18 offset 4800000000 spacemap 0 free 16G
metaslab 19 offset 4c00000000 spacemap 0 free 16G
metaslab 20 offset 5000000000 spacemap 0 free 16G
metaslab 21 offset 5400000000 spacemap 0 free 16G
metaslab 22 offset 5800000000 spacemap 0 free 16G
metaslab 23 offset 5c00000000 spacemap 0 free 16G
metaslab 24 offset 6000000000 spacemap 0 free 16G
metaslab 25 offset 6400000000 spacemap 0 free 16G
metaslab 26 offset 6800000000 spacemap 0 free 16G
metaslab 27 offset 6c00000000 spacemap 0 free 16G
metaslab 28 offset 7000000000 spacemap 0 free 16G
metaslab 29 offset 7400000000 spacemap 0 free 16G
metaslab 30 offset 7800000000 spacemap 0 free 16G
metaslab 31 offset 7c00000000 spacemap 0 free 16G
metaslab 32 offset 8000000000 spacemap 0 free 16G
metaslab 33 offset 8400000000 spacemap 0 free 16G
metaslab 34 offset 8800000000 spacemap 0 free 16G
metaslab 35 offset 8c00000000 spacemap 0 free 16G
metaslab 36 offset 9000000000 spacemap 0 free 16G
metaslab 37 offset 9400000000 spacemap 0 free 16G
metaslab 38 offset 9800000000 spacemap 0 free 16G
metaslab 39 offset 9c00000000 spacemap 0 free 16G
metaslab 40 offset a000000000 spacemap 0 free 16G
metaslab 41 offset a400000000 spacemap 0 free 16G
metaslab 42 offset a800000000 spacemap 0 free 16G
metaslab 43 offset ac00000000 spacemap 0 free 16G
metaslab 44 offset b000000000 spacemap 0 free 16G
metaslab 45 offset b400000000 spacemap 0 free 16G
metaslab 46 offset b800000000 spacemap 0 free 16G
metaslab 47 offset bc00000000 spacemap 0 free 16G
metaslab 48 offset c000000000 spacemap 0 free 16G
metaslab 49 offset c400000000 spacemap 0 free 16G
metaslab 50 offset c800000000 spacemap 0 free 16G
metaslab 51 offset cc00000000 spacemap 0 free 16G
metaslab 52 offset d000000000 spacemap 0 free 16G
metaslab 53 offset d400000000 spacemap 0 free 16G
metaslab 54 offset d800000000 spacemap 0 free 16G
metaslab 55 offset dc00000000 spacemap 0 free 16G
metaslab 56 offset e000000000 spacemap 0 free 16G
metaslab 57 offset e400000000 spacemap 0 free 16G
metaslab 58 offset e800000000 spacemap 0 free 16G
metaslab 59 offset ec00000000 spacemap 0 free 16G
metaslab 60 offset f000000000 spacemap 0 free 16G
metaslab 61 offset f400000000 spacemap 0 free 16G
metaslab 62 offset f800000000 spacemap 0 free 16G
metaslab 63 offset fc00000000 spacemap 0 free 16G
metaslab 64 offset 10000000000 spacemap 0 free 16G
metaslab 65 offset 10400000000 spacemap 0 free 16G
metaslab 66 offset 10800000000 spacemap 0 free 16G
metaslab 67 offset 10c00000000 spacemap 0 free 16G
metaslab 68 offset 11000000000 spacemap 0 free 16G
metaslab 69 offset 11400000000 spacemap 0 free 16G
metaslab 70 offset 11800000000 spacemap 0 free 16G
metaslab 71 offset 11c00000000 spacemap 0 free 16G
metaslab 72 offset 12000000000 spacemap 0 free 16G
metaslab 73 offset 12400000000 spacemap 0 free 16G
metaslab 74 offset 12800000000 spacemap 0 free 16G
metaslab 75 offset 12c00000000 spacemap 0 free 16G
metaslab 76 offset 13000000000 spacemap 0 free 16G
metaslab 77 offset 13400000000 spacemap 0 free 16G
metaslab 78 offset 13800000000 spacemap 0 free 16G
metaslab 79 offset 13c00000000 spacemap 0 free 16G
metaslab 80 offset 14000000000 spacemap 0 free 16G
metaslab 81 offset 14400000000 spacemap 0 free 16G
metaslab 82 offset 14800000000 spacemap 0 free 16G
metaslab 83 offset 14c00000000 spacemap 0 free 16G
metaslab 84 offset 15000000000 spacemap 0 free 16G
metaslab 85 offset 15400000000 spacemap 0 free 16G
metaslab 86 offset 15800000000 spacemap 0 free 16G
metaslab 87 offset 15c00000000 spacemap 0 free 16G
metaslab 88 offset 16000000000 spacemap 0 free 16G
metaslab 89 offset 16400000000 spacemap 0 free 16G
metaslab 90 offset 16800000000 spacemap 0 free 16G
metaslab 91 offset 16c00000000 spacemap 0 free 16G
metaslab 92 offset 17000000000 spacemap 0 free 16G
metaslab 93 offset 17400000000 spacemap 0 free 16G
metaslab 94 offset 17800000000 spacemap 0 free 16G
metaslab 95 offset 17c00000000 spacemap 0 free 16G
metaslab 96 offset 18000000000 spacemap 0 free 16G
metaslab 97 offset 18400000000 spacemap 0 free 16G
metaslab 98 offset 18800000000 spacemap 0 free 16G
metaslab 99 offset 18c00000000 spacemap 0 free 16G
metaslab 100 offset 19000000000 spacemap 0 free 16G
metaslab 101 offset 19400000000 spacemap 0 free 16G
metaslab 102 offset 19800000000 spacemap 0 free 16G
metaslab 103 offset 19c00000000 spacemap 0 free 16G
metaslab 104 offset 1a000000000 spacemap 0 free 16G
metaslab 105 offset 1a400000000 spacemap 0 free 16G
metaslab 106 offset 1a800000000 spacemap 0 free 16G
metaslab 107 offset 1ac00000000 spacemap 0 free 16G
metaslab 108 offset 1b000000000 spacemap 0 free 16G
metaslab 109 offset 1b400000000 spacemap 0 free 16G
metaslab 110 offset 1b800000000 spacemap 0 free 16G
metaslab 111 offset 1bc00000000 spacemap 0 free 16G
metaslab 112 offset 1c000000000 spacemap 0 free 16G
metaslab 113 offset 1c400000000 spacemap 0 free 16G
metaslab 114 offset 1c800000000 spacemap 0 free 16G
metaslab 115 offset 1cc00000000 spacemap 0 free 16G
metaslab 116 offset 1d000000000 spacemap 0 free 16G
metaslab 117 offset 1d400000000 spacemap 0 free 16G
metaslab 118 offset 1d800000000 spacemap 0 free 16G
metaslab 119 offset 1dc00000000 spacemap 0 free 16G
metaslab 120 offset 1e000000000 spacemap 0 free 16G
metaslab 121 offset 1e400000000 spacemap 0 free 16G
metaslab 122 offset 1e800000000 spacemap 0 free 16G
metaslab 123 offset 1ec00000000 spacemap 0 free 16G
metaslab 124 offset 1f000000000 spacemap 0 free 16G
metaslab 125 offset 1f400000000 spacemap 0 free 16G
metaslab 126 offset 1f800000000 spacemap 0 free 16G
metaslab 127 offset 1fc00000000 spacemap 0 free 16G
metaslab 128 offset 20000000000 spacemap 0 free 16G
metaslab 129 offset 20400000000 spacemap 0 free 16G
metaslab 130 offset 20800000000 spacemap 0 free 16G
metaslab 131 offset 20c00000000 spacemap 0 free 16G
metaslab 132 offset 21000000000 spacemap 0 free 16G
metaslab 133 offset 21400000000 spacemap 0 free 16G
metaslab 134 offset 21800000000 spacemap 0 free 16G
metaslab 135 offset 21c00000000 spacemap 0 free 16G
metaslab 136 offset 22000000000 spacemap 0 free 16G
metaslab 137 offset 22400000000 spacemap 0 free 16G
metaslab 138 offset 22800000000 spacemap 0 free 16G
metaslab 139 offset 22c00000000 spacemap 0 free 16G
metaslab 140 offset 23000000000 spacemap 0 free 16G
metaslab 141 offset 23400000000 spacemap 0 free 16G
metaslab 142 offset 23800000000 spacemap 0 free 16G
metaslab 143 offset 23c00000000 spacemap 0 free 16G
metaslab 144 offset 24000000000 spacemap 0 free 16G
metaslab 145 offset 24400000000 spacemap 0 free 16G
metaslab 146 offset 24800000000 spacemap 0 free 16G
metaslab 147 offset 24c00000000 spacemap 0 free 16G
metaslab 148 offset 25000000000 spacemap 0 free 16G
metaslab 149 offset 25400000000 spacemap 0 free 16G
metaslab 150 offset 25800000000 spacemap 0 free 16G
metaslab 151 offset 25c00000000 spacemap 0 free 16G
metaslab 152 offset 26000000000 spacemap 0 free 16G
metaslab 153 offset 26400000000 spacemap 0 free 16G
metaslab 154 offset 26800000000 spacemap 0 free 16G
metaslab 155 offset 26c00000000 spacemap 0 free 16G
metaslab 156 offset 27000000000 spacemap 0 free 16G
metaslab 157 offset 27400000000 spacemap 0 free 16G
metaslab 158 offset 27800000000 spacemap 0 free 16G
metaslab 159 offset 27c00000000 spacemap 0 free 16G
metaslab 160 offset 28000000000 spacemap 0 free 16G
metaslab 161 offset 28400000000 spacemap 0 free 16G
metaslab 162 offset 28800000000 spacemap 0 free 16G
metaslab 163 offset 28c00000000 spacemap 0 free 16G
metaslab 164 offset 29000000000 spacemap 0 free 16G
metaslab 165 offset 29400000000 spacemap 0 free 16G
metaslab 166 offset 29800000000 spacemap 0 free 16G
metaslab 167 offset 29c00000000 spacemap 0 free 16G
metaslab 168 offset 2a000000000 spacemap 0 free 16G
metaslab 169 offset 2a400000000 spacemap 0 free 16G
space map refcount mismatch: expected 169 != actual 0
this nevertheless leaves an uneasy feeling of what is actually happening here
strangely github's comment system doesn't let me address @sempervictus directly (isn't shown in the list)
the only additional patch I'm running compared to your patch stack (the only additional patch and/or feature I'm using is #2351 ) - not sure if that could make a difference, either for the worse or better
the others were merged meanwhile
* origin/pr/3078 [merged]
* origin/pr/3074 [merged]
* origin/pr/2970 [merged]
* origin/pr/2909
* origin/pr/2890
* origin/pr/2820
* origin/pr/2784
* origin/pr/2012
* origin/pr/1099 (iSCSI support)
* origin/pr/2685 [merged]
* origin/pr/3063 [merged]
* origin/pr/3076 [merged]
* origin/pr/2129
* origin/pr/2351
I'm NOT using iSCSI here, or at all ...
I think the grievous error was introduced when i stupidly tried a rollback on a broken dataset. Wonder if even the gurus know how to fix this.
Far as the patch stack - @behlendorf merged most of mine, and i'm running a pretty thin one right now consisting of:
* origin/pr/2820
** Set zfs_autoimport_disable default value to 1
* origin/pr/2012
** Add option to zpool status to print guids
* origin/pr/1099
** Misc iSCSI fixes/cleanups/attempts to get 'zfs rename' to work.
* origin/pr/2129
** Enable ABD
* master @ 7d2868d5fc88a4c30769b44f56a3a88a4277a9ab
* Removed:
** origin/pr/2784
** origin/pr/2909
The two i omitted due to potential for causing this are listed as well, and i think they did have an impact as i now see spacemaps, just not the right number (before we were seeing everything in SM 0).
I havent tried #2351 in a while - caused some serious problems for me (hard lock on AVX system) when i first tried it, and i've stayed away since. Might be useful since my rpool is sha256 and returning lots of cksum problems since this error bubbled up (highly doubt its a coincidence).
The fact that i'm seeing obnoxious amounts of cksum errors on my pool, and more on its only vdev (granted i'm scrubbing half a T), without any application layer data issues or dmesg complaints is confusing. I'm still waiting on a smartctl --long test to complete, but i see no issues so far (10% remaining, these take forever).
Scrubs are coming back with 0 errors, but the pool retains its one "epic fail" and the 3.74MB of data associated with it. Luckily the other pool on the same host is totally fine, so thats almost 2T i wont have to restore.
@kernelOfTruth i just noticed your spacemap was also zero, and you're still running the two PRs i removed. #2909 looks more likely the culprit, and i've lodged a question asking if we're sure that spa_writeable(vd->vdev_spa) will return true, because my guess is that we're actually running live pools without real spacemaps when using the PR in some situations.
Here's what i think happened: #2909 allows us to import vdevs without a space map. We write, and screw up the map by updating the wrong data structure. You're still running the PR, so zdb -m returns SM 0 for everything. I've removed the PR, so i'm seeing a mismatch between the original + current number of space maps, but missing every entry and SM created while running the PR. This may also explain why i cant bloody destroy those few MB, the zdb assertion i'm seeing is probably caused by this mismatch as well, and my cksum flood seems likely to be related to either the space maps being loaded, or the jacked up data i've written since.
The PR looked safe at first review, but i'm not nearly l33t enough to picture the execution context/stack properly here, and am afraid that without changes to how scrubs are handled (recalculating the space maps), my pool may be FUBAR.
There are definitely hosts showing this problem which have not run #2909 themselves, that PR was specifically put into test hosts because they often deal with broken pools (my laptop unfortunately qualifies as such a mule since its got server-grade resources and quick physical access). However, the waterfall replication topology of our datacenter ZFS boxes may account for the error appearing on those machines - they have datasets which back up the test mules.
Hi,
I just take a look on some servers we manage, and I get the same error on each off these,
zdb -mM rpool return with space map refcount mismatch: expected 363 != actual 0
these server all contains latest master with #2909
for now zpool status is happy, what is the correct way to handle this situation ?
Regards, Eric
@edillmann : One of my pools is undamaged, so removing the patch from the stack appears "safe enough," provided you dont have additional errors coming up like the one which started all of this.
So far, vpool, the undamaged pool, has passed a scrub and a zdb -mc subsequent to the newly built ZFS and SPL modules being loaded without #2909 ( still reports a refcount mismatch but the checksums all align). Rpool is not so lucky, although it reports 0 errors on scrub now, it has the persistent error @0xffffffffffffffff:0x0 with the associated "unfreeable" empty space, and zdb -mc results in the same
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 115 of 116 ...
zdb: ../../cmd/zdb/zdb.c:2618: Assertion `0 == bptree_iterate(spa->spa_meta_objset, spa->spa_dsl_pool->dp_bptree_obj, B_FALSE, count_block_cb, &zcb, ((void *)0)) (0x0 == 0x34)' failed.
Aborted
If my hunch about the free space information going into the wrong metaslab's space map is correct, i'd suggest pulling it from the stack sooner than later, as all writes subsequent to the misplaced spacemap writes starting would exacerbate the issue and potentially lead to checksum errors and misplaced writes overwriting real blocks.
@sempervictus running zdb -mc on one of these pools give me the following result :
......
leaked space: vdev 0, offset 0x17dff41600, size 62464
leaked space: vdev 0, offset 0x17dffffe00, size 512
leaked space: vdev 0, offset 0x17dff5fc00, size 512
leaked space: vdev 0, offset 0x17dfed5400, size 61440
leaked space: vdev 0, offset 0x17dfbf7e00, size 6144
leaked space: vdev 0, offset 0x17ded36e00, size 12800
leaked space: vdev 0, offset 0x17de155a00, size 90112
leaked space: vdev 0, offset 0x17dd5a1a00, size 6144
leaked space: vdev 0, offset 0x17db13a600, size 512
leaked space: vdev 0, offset 0x17d8d14400, size 48128
leaked space: vdev 0, offset 0x17cbd35600, size 84480
block traversal size 95683824128 != alloc 0 (leaked -95683824128)
bp count: 3201295
bp logical: 198605664768 avg: 62039
bp physical: 94573446144 avg: 29542 compression: 2.10
bp allocated: 95683824128 avg: 29889 compression: 2.08
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 0 used: 0.00%
additional, non-pointer bps of type 0: 54539
That seem's obviously wrong and scaring, any advise is welcome
As the pool runs, do those numbers get worse?
Far as what to do, i'm not the authority on this, but so far i've been tackling this with a slim zfs build and subsequent send/recv to clean up the data (have yet to see how effective that is).
I've got a production host which has a split workload hosting VMs and their underlying ZFS storage (as well as a bit of NFS) which is currently running zdb -mc on the new patch stack. No crash yet, we'll see how this goes.
In terms of actually fixing the on-disk errors we've created, #3111 may be a starting point, since scrubs seem to miss/ignore these possibly serious errors. Seems like a gap in the design or implementation IMO.
This looks promising. zdb -mc on the production host which was given the new slimmed down patch stack, then scrubbed, then verified using zdb, returned the following:
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
loading space map for vdev 1 of 2, metaslab 96 of 148 ....
1.26T completed ( 413MB/s) estimated time remaining: 0hr 00min 00sec
No leaks (block sum matches space maps exactly)
bp count: 44876409
bp logical: 865979950592 avg: 19296
bp physical: 628182239232 avg: 13998 compression: 1.38
bp allocated: 1382804668416 avg: 30813 compression: 0.63
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 1382804226048 used: 54.02%
additional, non-pointer bps of type 0: 1715474
space map refcount mismatch: expected 287 != actual 256
This thing's been running #2909 as part of its prior patch stack for a while, and isn't spitting out leaked space messages. Far from ideal, but there may be light at the end of the tunnel (often turns out to be an oncoming train when it comes to data loss).
zdb -mc
says:
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
1.69T completed ( 597MB/s) estimated time remaining: 4294792253hr 4294967281min 4294967292sec leaked space: vdev 0, offset 0x2000, size 4096
block traversal size 1865256910848 != alloc 0 (leaked -1865256910848)
bp count: 17576044
bp logical: 1957469844992 avg: 111371
bp physical: 1853280807936 avg: 105443 compression: 1.06
bp allocated: 1865256910848 avg: 106124 compression: 1.05
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 0 used: 0.00%
additional, non-pointer bps of type 0: 16461
hm, seems like we definitely need a fix for these kind of issues ! :+1:
@sempervictus #2909 could be the culprit but the reports from FreeBSD-based distros makes me believe that it might trigger it easier - thus it might not be the (sole ?) cause
@behlendorf lots of vdevs use ditto blocks on my pools, could data integrity for those files also be compromised in combination with this issue ?
what features are heavy users of of the spacemap histogram(s) and could be prone to failure and/or data loss or corruption ?
@kernelOfTruth I dont think that #2909 is the only cause for this - i'm pretty sure other things can cause this such as Linux trying to add ACLs after the dnode is written but before the emergency power outage. I can also imagine a situation in BSD where ZVOLs being allocated as GEOMs instead of simple DEVs may incur a non-atomic write condition for the practice, also causing inconsistent commits in an emergency outage.
If my assessment is correct, then #2909 simply allowed us to bring this to light (granted, in a rather scary manner), by allowing us to explicitly write to the wrong space map (i'm not even sure that 0 is a space map). I've seen reports of these problems long before ZoL even went psuedo-production, so its very much possible under different conditions, but not the easiest of things to reproduce unless you forcibly screw yourself to the wall like this.
Here's where i need a helping hand from someone who's head is in the light of day: is my theory about writing to occupied blocks with valid pointers not tripping a scrub viable? If so, we may have pools out there with overwritten data which is checksumming properly as the BP to it is valid (written correctly when the spacemap reported the wrong block and we wrote the block, creating a new pointer in the tree). If that IS viable, it may create uttterly unrecoverable conditions by overwriting snaps, and blocks on which the newly written CoW block depends (write 512Kb across 4 records of 32 4k blocks each, the 4th record could overwrite the first due to a space map error creating dangling blocks since the blocks comprising record 1 which point to record 2 are overwritten with record 4, and the EP is gone).
Please tell me someone like @dweeezil or @ryao have a back-pocket fsck util or patches to scrub allowing us to fix this mess. How do the vendor backed implementations handle this? I can think of a dozen vendors pretending to have invented the successor to sliced bread when actually running ZFS under the hood (usually neutered as hell and making users license features we take for granted, like pay-per-snap-licensing cost in a SAN you buy for 20K). Then again, if this isnt addressed in Illumos and BSD, probably not very likely that someone else has figured it out before those guys.
To everyone reading this and using #2909: my suggestion is to drop it from the stack now. Systems which lost the patch from the set last night in our environments are static in their misreporting - the numbers arent growing (aside from my machine where the cksums are through the roof now and smartctl --test long passed with flying colors after a 15 hour torture session, without divine intervention my pool is hosed methinks).
an example how this manifests on newly created zvols/zpools:
zpool create -o ashift=12 SWAP2_1 /dev/mapper/swap2_1 -f
zpool create -o ashift=12 SWAP2_2 /dev/mapper/swap2_2 -f
zfs set atime=off SWAP2_1
zfs set atime=off SWAP2_2
zfs create -V 8G -b $(getconf PAGESIZE) \
-o primarycache=metadata \
-o com.sun:auto-snapshot=false SWAP2_1/swap
zfs create -V 8G -b $(getconf PAGESIZE) \
-o primarycache=metadata \
-o com.sun:auto-snapshot=false SWAP2_2/swap
zfs set sync=always SWAP2_1/swap
zfs set sync=always SWAP2_2/swap
mkswap -f /dev/zvol/SWAP2_1/swap
mkswap -f /dev/zvol/SWAP2_2/swap
zdb -mc SWAP2_1
Metaslabs:
vdev 0
metaslabs 117 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 0 free 512M
metaslab 1 offset 20000000 spacemap 0 free 512M
metaslab 2 offset 40000000 spacemap 0 free 512M
metaslab 3 offset 60000000 spacemap 0 free 512M
metaslab 4 offset 80000000 spacemap 0 free 512M
metaslab 5 offset a0000000 spacemap 0 free 512M
metaslab 6 offset c0000000 spacemap 0 free 512M
metaslab 7 offset e0000000 spacemap 0 free 512M
metaslab 8 offset 100000000 spacemap 0 free 512M
metaslab 9 offset 120000000 spacemap 0 free 512M
metaslab 10 offset 140000000 spacemap 0 free 512M
metaslab 11 offset 160000000 spacemap 0 free 512M
metaslab 12 offset 180000000 spacemap 0 free 512M
metaslab 13 offset 1a0000000 spacemap 0 free 512M
metaslab 14 offset 1c0000000 spacemap 0 free 512M
metaslab 15 offset 1e0000000 spacemap 0 free 512M
metaslab 16 offset 200000000 spacemap 0 free 512M
metaslab 17 offset 220000000 spacemap 0 free 512M
metaslab 18 offset 240000000 spacemap 0 free 512M
metaslab 19 offset 260000000 spacemap 0 free 512M
metaslab 20 offset 280000000 spacemap 0 free 512M
metaslab 21 offset 2a0000000 spacemap 0 free 512M
metaslab 22 offset 2c0000000 spacemap 0 free 512M
metaslab 23 offset 2e0000000 spacemap 0 free 512M
metaslab 24 offset 300000000 spacemap 0 free 512M
metaslab 25 offset 320000000 spacemap 0 free 512M
metaslab 26 offset 340000000 spacemap 0 free 512M
metaslab 27 offset 360000000 spacemap 0 free 512M
metaslab 28 offset 380000000 spacemap 0 free 512M
metaslab 29 offset 3a0000000 spacemap 0 free 512M
metaslab 30 offset 3c0000000 spacemap 0 free 512M
metaslab 31 offset 3e0000000 spacemap 0 free 512M
metaslab 32 offset 400000000 spacemap 0 free 512M
metaslab 33 offset 420000000 spacemap 0 free 512M
metaslab 34 offset 440000000 spacemap 0 free 512M
metaslab 35 offset 460000000 spacemap 0 free 512M
metaslab 36 offset 480000000 spacemap 0 free 512M
metaslab 37 offset 4a0000000 spacemap 0 free 512M
metaslab 38 offset 4c0000000 spacemap 0 free 512M
metaslab 39 offset 4e0000000 spacemap 0 free 512M
metaslab 40 offset 500000000 spacemap 0 free 512M
metaslab 41 offset 520000000 spacemap 0 free 512M
metaslab 42 offset 540000000 spacemap 0 free 512M
metaslab 43 offset 560000000 spacemap 0 free 512M
metaslab 44 offset 580000000 spacemap 0 free 512M
metaslab 45 offset 5a0000000 spacemap 0 free 512M
metaslab 46 offset 5c0000000 spacemap 0 free 512M
metaslab 47 offset 5e0000000 spacemap 0 free 512M
metaslab 48 offset 600000000 spacemap 0 free 512M
metaslab 49 offset 620000000 spacemap 0 free 512M
metaslab 50 offset 640000000 spacemap 0 free 512M
metaslab 51 offset 660000000 spacemap 0 free 512M
metaslab 52 offset 680000000 spacemap 0 free 512M
metaslab 53 offset 6a0000000 spacemap 0 free 512M
metaslab 54 offset 6c0000000 spacemap 0 free 512M
metaslab 55 offset 6e0000000 spacemap 0 free 512M
metaslab 56 offset 700000000 spacemap 0 free 512M
metaslab 57 offset 720000000 spacemap 0 free 512M
metaslab 58 offset 740000000 spacemap 0 free 512M
metaslab 59 offset 760000000 spacemap 0 free 512M
metaslab 60 offset 780000000 spacemap 0 free 512M
metaslab 61 offset 7a0000000 spacemap 0 free 512M
metaslab 62 offset 7c0000000 spacemap 0 free 512M
metaslab 63 offset 7e0000000 spacemap 0 free 512M
metaslab 64 offset 800000000 spacemap 0 free 512M
metaslab 65 offset 820000000 spacemap 0 free 512M
metaslab 66 offset 840000000 spacemap 0 free 512M
metaslab 67 offset 860000000 spacemap 0 free 512M
metaslab 68 offset 880000000 spacemap 0 free 512M
metaslab 69 offset 8a0000000 spacemap 0 free 512M
metaslab 70 offset 8c0000000 spacemap 0 free 512M
metaslab 71 offset 8e0000000 spacemap 0 free 512M
metaslab 72 offset 900000000 spacemap 0 free 512M
metaslab 73 offset 920000000 spacemap 0 free 512M
metaslab 74 offset 940000000 spacemap 0 free 512M
metaslab 75 offset 960000000 spacemap 0 free 512M
metaslab 76 offset 980000000 spacemap 0 free 512M
metaslab 77 offset 9a0000000 spacemap 0 free 512M
metaslab 78 offset 9c0000000 spacemap 0 free 512M
metaslab 79 offset 9e0000000 spacemap 0 free 512M
metaslab 80 offset a00000000 spacemap 0 free 512M
metaslab 81 offset a20000000 spacemap 0 free 512M
metaslab 82 offset a40000000 spacemap 0 free 512M
metaslab 83 offset a60000000 spacemap 0 free 512M
metaslab 84 offset a80000000 spacemap 0 free 512M
metaslab 85 offset aa0000000 spacemap 0 free 512M
metaslab 86 offset ac0000000 spacemap 0 free 512M
metaslab 87 offset ae0000000 spacemap 0 free 512M
metaslab 88 offset b00000000 spacemap 0 free 512M
metaslab 89 offset b20000000 spacemap 0 free 512M
metaslab 90 offset b40000000 spacemap 0 free 512M
metaslab 91 offset b60000000 spacemap 0 free 512M
metaslab 92 offset b80000000 spacemap 0 free 512M
metaslab 93 offset ba0000000 spacemap 0 free 512M
metaslab 94 offset bc0000000 spacemap 0 free 512M
metaslab 95 offset be0000000 spacemap 0 free 512M
metaslab 96 offset c00000000 spacemap 0 free 512M
metaslab 97 offset c20000000 spacemap 0 free 512M
metaslab 98 offset c40000000 spacemap 0 free 512M
metaslab 99 offset c60000000 spacemap 0 free 512M
metaslab 100 offset c80000000 spacemap 0 free 512M
metaslab 101 offset ca0000000 spacemap 0 free 512M
metaslab 102 offset cc0000000 spacemap 0 free 512M
metaslab 103 offset ce0000000 spacemap 0 free 512M
metaslab 104 offset d00000000 spacemap 0 free 512M
metaslab 105 offset d20000000 spacemap 0 free 512M
metaslab 106 offset d40000000 spacemap 0 free 512M
metaslab 107 offset d60000000 spacemap 0 free 512M
metaslab 108 offset d80000000 spacemap 0 free 512M
metaslab 109 offset da0000000 spacemap 0 free 512M
metaslab 110 offset dc0000000 spacemap 0 free 512M
metaslab 111 offset de0000000 spacemap 0 free 512M
metaslab 112 offset e00000000 spacemap 0 free 512M
metaslab 113 offset e20000000 spacemap 0 free 512M
metaslab 114 offset e40000000 spacemap 0 free 512M
metaslab 115 offset e60000000 spacemap 0 free 512M
metaslab 116 offset e80000000 spacemap 0 free 512M
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
leaked space: vdev 0, offset 0xb000, size 4096
leaked space: vdev 0, offset 0xf000, size 4096
leaked space: vdev 0, offset 0x15000, size 73728
leaked space: vdev 0, offset 0x13000, size 4096
leaked space: vdev 0, offset 0xd000, size 4096
leaked space: vdev 0, offset 0x5f000, size 536481792
leaked space: vdev 0, offset 0x4b000, size 8192
leaked space: vdev 0, offset 0x29000, size 114688
leaked space: vdev 0, offset 0x2c000b000, size 4096
leaked space: vdev 0, offset 0x2c000f000, size 4096
leaked space: vdev 0, offset 0x2c0015000, size 73728
leaked space: vdev 0, offset 0x2c0013000, size 4096
leaked space: vdev 0, offset 0x2c000d000, size 4096
leaked space: vdev 0, offset 0x2c005d000, size 536489984
leaked space: vdev 0, offset 0x2c004b000, size 8192
leaked space: vdev 0, offset 0x2c0029000, size 114688
leaked space: vdev 0, offset 0x580002000, size 4096
leaked space: vdev 0, offset 0x580000000, size 4096
leaked space: vdev 0, offset 0x580008000, size 40960
leaked space: vdev 0, offset 0x580006000, size 4096
leaked space: vdev 0, offset 0x580040000, size 536608768
leaked space: vdev 0, offset 0x580036000, size 8192
leaked space: vdev 0, offset 0x580014000, size 114688
block traversal size 430080 != alloc 0 (leaked -430080)
bp count: 69
bp logical: 712192 avg: 10321
bp physical: 45568 avg: 660 compression: 15.63
bp allocated: 430080 avg: 6233 compression: 1.66
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 0 used: 0.00%
additional, non-pointer bps of type 0: 26
ugh :(
related: http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028232.html
http://lists.open-zfs.org/pipermail/developer/2014-June/000691.html
not sure if related but referencing #2662 and #2702
@kernelOfTruth I confirm to have the same behaviour with master, I also run zdb -e mc on an fresh created and exported pool with the same results :
leaked space: vdev 0, offset 0xb000, size 4096
leaked space: vdev 0, offset 0x15000, size 16384
leaked space: vdev 0, offset 0x1b000, size 8192
leaked space: vdev 0, offset 0x12000, size 8192
leaked space: vdev 0, offset 0x2e000, size 8589746176
leaked space: vdev 0, offset 0x25000, size 4096
leaked space: vdev 0, offset 0x1e000, size 4096
leaked space: vdev 1, offset 0xb000, size 4096
leaked space: vdev 1, offset 0x15000, size 16384
leaked space: vdev 1, offset 0x1b000, size 8192
leaked space: vdev 1, offset 0x12000, size 8192
leaked space: vdev 1, offset 0x2e000, size 8589746176
leaked space: vdev 1, offset 0x25000, size 4096
leaked space: vdev 1, offset 0x1e000, size 4096
leaked space: vdev 1, offset 0x2c00005000, size 8192
leaked space: vdev 1, offset 0x2c0000e000, size 8192
leaked space: vdev 1, offset 0x2c00008000, size 16384
leaked space: vdev 1, offset 0x2c00021000, size 8589799424
leaked space: vdev 1, offset 0x2c00018000, size 4096
leaked space: vdev 1, offset 0x2c00011000, size 4096
block traversal size 380928 != alloc 0 (leaked -380928)
@sempervictus @kernelOfTruth did you use #2129 in your patch stack ? I just remove it and it seem's to cleanup things.
We try to put #2129 through its paces, and a number of the affected hosts use it, but the iSCSI nodes were dying after weeks of running with it, and we had to strip it out. While trying to fix this ungodly mess i did run master without it, same deal.
Here's the good news: i managed to salvage my data by creating a new rpool and only streaming the unaffected datasets to the new pool. A few changes to crypt and fstabs, copy of the /boot and /boot/efi data with updated UUIDs, and the new pool boots fine and passes all previously failing zdb checks.
It appears that in order to ensure that bad data does not leak through, zfs send/recv must be done on both sides with a build which does not screw up space maps itself, without the -e or -D options. I did this with the following stack:
* origin/pr/2820
* origin/pr/2012
* origin/pr/1099
* origin/pr/2129
* master @ 7d2868d5fc88a4c30769b44f56a3a88a4277a9ab
Although this has resolved my own data corruption problem, i'm now faced with a bunch of live storage hosts at our datacenter potentially plotting to allocate a block which looks free in the space map but has data still in it. RaidZ2 should help there i guess, but still kind of creepy to think about some top branch in the tree getting cut off by a misplaced write.
@sempervictus @kernelOfTruth @edillmann There have been a few space map related bugs which have crept up on all the ZFS platforms over the years. We take them seriously because as you know there are currently tools to detect the issue but none to fix it (aside from send/recv to a new pool). In fact, one of the key reasons we run ztest against every patch is to attempt to detect this issues. Arguably we should also be running the zdb leak check on the test pools used by xfstests and filebench too.
I think it's definitely possible that the unmerged #2129 or #2909 pull requests could have introduced an issue like this. It's not totally clear to .e from the comments above but were you able to verify that this issue doesn't exist on master? We absolutely can't tag anything if we believe this issue exists in the master branch.
One other thing to keep in mind. The zdb leak check only works reliably on exported or 100% idle pools. If the pool is active it can generate false positives.
@behlendorf I did the test on the same idle pool zdb #2129 -mc rpool
Metaslabs:
vdev 0
metaslabs 174 offset spacemap free
--------------- ------------------- --------------- -------------
metaslab 0 offset 0 spacemap 0 free 512M
metaslab 1 offset 20000000 spacemap 0 free 512M
metaslab 2 offset 40000000 spacemap 0 free 512M
metaslab 3 offset 60000000 spacemap 0 free 512M
metaslab 4 offset 80000000 spacemap 0 free 512M
metaslab 5 offset a0000000 spacemap 0 free 512M
...
metaslab 169 offset 1520000000 spacemap 0 free 512M
metaslab 170 offset 1540000000 spacemap 0 free 512M
metaslab 171 offset 1560000000 spacemap 0 free 512M
metaslab 172 offset 1580000000 spacemap 0 free 512M
metaslab 173 offset 15a0000000 spacemap 0 free 512M
**!!! All spacemaps usage is = 0 **
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
leaked space: vdev 0, offset 0x68000, size 73728 leaked space: vdev 0, offset 0x20000, size 8192 leaked space: vdev 0, offset 0x9a000, size 8192 leaked space: vdev 0, offset 0xba000, size 8192 ... leaked space: vdev 0, offset 0x15bf51c000, size 10739712 leaked space: vdev 0, offset 0x15befc8000, size 5357568 leaked space: vdev 0, offset 0x15bb490000, size 24428544 leaked space: vdev 0, offset 0x15bb466000, size 8192 leaked space: vdev 0, offset 0x15b7266000, size 24576 block traversal size 35960070144 != alloc 0 (leaked -35960070144)
bp count: 797123
bp logical: 55863028736 avg: 70080
bp physical: 31524022784 avg: 39547 compression: 1.77
bp allocated: 35960070144 avg: 45112 compression: 1.55
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 0 used: 0.00%
additional, non-pointer bps of type 0: 55465
zdb@master mc rpool
Traversing all blocks to verify metadata checksums and verify nothing leaked ...
loading space map for vdev 0 of 1, metaslab 173 of 174 ...
33.2G completed (1079MB/s) estimated time remaining: 0hr 00min 00sec
No leaks (block sum matches space maps exactly)
bp count: 797102
bp logical: 55862745088 avg: 70082
bp physical: 31523987456 avg: 39548 compression: 1.77
bp allocated: 35959865344 avg: 45113 compression: 1.55
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 35959865344 used: 38.49%
additional, non-pointer bps of type 0: 55459
space map refcount mismatch: expected 175 != actual 174
My conclusion is that zdb@#2129 has a little bug about reading spacemaps, this problem doesn't occurs with master, or should i worry about refount mismatch ?
@edillmann thanks for running this test. I does sound like somethings wrong with the abd branch you tested. But I'm relieved to see that master doesn't suffer from this. However, we should explain the refcount mismatch. Do you have a trivial test case for the refcount mismatch, if you could you open a new issue with it.
I'm removing this as an 0.6.4 blocker. The changes which caused this problem were not part of the master branch and therefore shouldn't hold up the release.
So it's confirmed that some aspect of ABD (#2129) or an older version of it are at fault?
I have a planned upgrade looming and want to know.
@edillmann yes, I used #2129 since that improved the desktop improvement quite noticably
without it (now) - there are some increased minor latencies here and there
Did you try #2129 by simply applying the pull to the latest master via:
https://github.com/zfsonlinux/zfs/pull/2129.diff
whether that also lead to the spacemap issues ?
@behlendorf Seems like a good idea to add zdb to the test battery
As I'm also using Btrfs I noticed a feature that seems to be similar to ZFS' space maps: space_cache , there is the option to clear and subsequently rebuild it - seemingly without data loss, that's at least the impression I got from reading through the mailing list
it would be nice if that also were possible on ZFS
I still don't have clear answer to my question which is:
When "just" using zpool's with those broken space maps if there is the possibility that existing data gets overwritten and thus corrupted (which really shouldn't happen)
That's my main concern - since, after going through my data I seemingly don't have any pool that is/was NOT affected (all affected), thus I started destroying and creating pools from scratch which don't exhibit this issue
No apparaent data loss yet
@DeHackEd I'd also like to know that - since it makes quite a difference in usage, @sempervictus seems to be running newly created pools or at least pools with zvols on a patch stack which don't suffer from the here mentioned space map problems if I understood correctly
@sempervictus So that means running the mentioned patch stack on top of latest master
* origin/pr/2820
* origin/pr/2012
* origin/pr/1099
* origin/pr/2129
* master @ 7d2868d5fc88a4c30769b44f56a3a88a4277a9ab
does NOT show the space map problems and thus would get a :+1: from you ?
Thanks
Let me summarize where I think things are at.
- The #2909 debug patch could cause an issue like this. It was never part of master.
- The #2129 (ABD) patches might have caused something like this. The current version is believed to be safe and correct but older versions or alternate development branches may not be.
- Over the years there have been a small number of other related bug which could cause this issue. However, at the moment, all known causes of this issue have been resolved in master.
Seems like a good idea to add zdb to the test battery
Absolutely, ztest already does this however it doesn't provide perfect coverage. Running the zdb leak checker after both xfstests and filebench would improve our test coverage. I'll see about adding it to the scripts.
Recreating the space maps.
Issue #3111 was opened as a feature request to explore doing this. There are a variety of ways to go about it.
When "just" using zpool's with those broken space maps if there is the possibility that existing data gets overwritten and thus corrupted (which really shouldn't happen)
Yes. If the space maps for a pool are wrong it's possible that existing data will be overwritten. If this happens ZFS will detect the damaged blocks during a scrub and may be able to repair them. However, there is currently no existing tool to repair the space maps themselves.