zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Spacemap Problem with Data Error

Open sempervictus opened this issue 10 years ago • 78 comments

Scrub seems to have found an error in last night's snapshot:

zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 1h18m with 1 errors on Wed Feb 11 11:18:42 2015
config:

    NAME          STATE     READ WRITE CKSUM
    rpool         ONLINE       0     0     0
      root_crypt  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        rpool/steam@20150210-1528:<0x0>

ZDB also says:

MOS Configuration:
        version: 5000
        name: 'rpool'
        state: 0
        txg: 17002896
        pool_guid: 8092466394753084738
        errata: 0
        hostid: 1714892848
        hostname: 'unknown'
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 8092466394753084738
            children[0]:
                type: 'disk'
                id: 0
                guid: 15923963185428311133
                path: '/dev/mapper/root_crypt'
                whole_disk: 0
                metaslab_array: 34
                metaslab_shift: 33
                ashift: 12
                asize: 999403552768
                is_log: 0
                create_txg: 4
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data
space map refcount mismatch: expected 125 != actual 0

... that last "space map refcount mismatch" piece is a bit concerning. Does this have anything to do with recent patches to spacemap accounting, and would those changes be the problem, or are they detecting an existing issue from bad prior logic?

sempervictus avatar Feb 11 '15 17:02 sempervictus

@sempervictus if a spacemap were somehow wrong it could result in permanent damage to a file. For example, if the spacemap indicated a block was free when it wasn't then a new write could overwrite existing data in a snapshot. For a pool configured without redundancy that damage wouldn't be correctable and the scrub would detect it like your seeing.

It would be useful to get a closer look at the spacemaps, could you run zdb -m and zdb -M on the pool. The zdb error your seeing is saying that it expects there to be 125 metaslabs but none were actually found which is hard to believe.

behlendorf avatar Feb 11 '15 19:02 behlendorf

Just noticed #3108 appears to have the same problem. Did we introduce a regression which is causing corruption in active datasets?

Outputs for zdb commands on the dataset:

zdb -m rpool/steam
Dataset rpool/steam [ZPL], ID 17529, cr_txg 14065881, 98.6G, 78176 objects
zdb -M rpool/steam
Dataset rpool/steam [ZPL], ID 17529, cr_txg 14065881, 98.6G, 78176 objects

and on the pool:

zdb -m rpool

Metaslabs:
    vdev          0
    metaslabs   116   offset                spacemap          free      
    ---------------   -------------------   ---------------   -------------
    metaslab      0   offset            0   spacemap      0   free       8G
    metaslab      1   offset    200000000   spacemap      0   free       8G
    metaslab      2   offset    400000000   spacemap      0   free       8G
    metaslab      3   offset    600000000   spacemap      0   free       8G
    metaslab      4   offset    800000000   spacemap      0   free       8G
    metaslab      5   offset    a00000000   spacemap      0   free       8G
    metaslab      6   offset    c00000000   spacemap      0   free       8G
    metaslab      7   offset    e00000000   spacemap      0   free       8G
    metaslab      8   offset   1000000000   spacemap      0   free       8G
    metaslab      9   offset   1200000000   spacemap      0   free       8G
    metaslab     10   offset   1400000000   spacemap      0   free       8G
    metaslab     11   offset   1600000000   spacemap      0   free       8G
    metaslab     12   offset   1800000000   spacemap      0   free       8G
    metaslab     13   offset   1a00000000   spacemap      0   free       8G
    metaslab     14   offset   1c00000000   spacemap      0   free       8G
    metaslab     15   offset   1e00000000   spacemap      0   free       8G
    metaslab     16   offset   2000000000   spacemap      0   free       8G
    metaslab     17   offset   2200000000   spacemap      0   free       8G
    metaslab     18   offset   2400000000   spacemap      0   free       8G
    metaslab     19   offset   2600000000   spacemap      0   free       8G
    metaslab     20   offset   2800000000   spacemap      0   free       8G
    metaslab     21   offset   2a00000000   spacemap      0   free       8G
    metaslab     22   offset   2c00000000   spacemap      0   free       8G
    metaslab     23   offset   2e00000000   spacemap      0   free       8G
    metaslab     24   offset   3000000000   spacemap      0   free       8G
    metaslab     25   offset   3200000000   spacemap      0   free       8G
    metaslab     26   offset   3400000000   spacemap      0   free       8G
    metaslab     27   offset   3600000000   spacemap      0   free       8G
    metaslab     28   offset   3800000000   spacemap      0   free       8G
    metaslab     29   offset   3a00000000   spacemap      0   free       8G
    metaslab     30   offset   3c00000000   spacemap      0   free       8G
    metaslab     31   offset   3e00000000   spacemap      0   free       8G
    metaslab     32   offset   4000000000   spacemap      0   free       8G
    metaslab     33   offset   4200000000   spacemap      0   free       8G
    metaslab     34   offset   4400000000   spacemap      0   free       8G
    metaslab     35   offset   4600000000   spacemap      0   free       8G
    metaslab     36   offset   4800000000   spacemap      0   free       8G
    metaslab     37   offset   4a00000000   spacemap      0   free       8G
    metaslab     38   offset   4c00000000   spacemap      0   free       8G
    metaslab     39   offset   4e00000000   spacemap      0   free       8G
    metaslab     40   offset   5000000000   spacemap      0   free       8G
    metaslab     41   offset   5200000000   spacemap      0   free       8G
    metaslab     42   offset   5400000000   spacemap      0   free       8G
    metaslab     43   offset   5600000000   spacemap      0   free       8G
    metaslab     44   offset   5800000000   spacemap      0   free       8G
    metaslab     45   offset   5a00000000   spacemap      0   free       8G
    metaslab     46   offset   5c00000000   spacemap      0   free       8G
    metaslab     47   offset   5e00000000   spacemap      0   free       8G
    metaslab     48   offset   6000000000   spacemap      0   free       8G
    metaslab     49   offset   6200000000   spacemap      0   free       8G
    metaslab     50   offset   6400000000   spacemap      0   free       8G
    metaslab     51   offset   6600000000   spacemap      0   free       8G
    metaslab     52   offset   6800000000   spacemap      0   free       8G
    metaslab     53   offset   6a00000000   spacemap      0   free       8G
    metaslab     54   offset   6c00000000   spacemap      0   free       8G
    metaslab     55   offset   6e00000000   spacemap      0   free       8G
    metaslab     56   offset   7000000000   spacemap      0   free       8G
    metaslab     57   offset   7200000000   spacemap      0   free       8G
    metaslab     58   offset   7400000000   spacemap      0   free       8G
    metaslab     59   offset   7600000000   spacemap      0   free       8G
    metaslab     60   offset   7800000000   spacemap      0   free       8G
    metaslab     61   offset   7a00000000   spacemap      0   free       8G
    metaslab     62   offset   7c00000000   spacemap      0   free       8G
    metaslab     63   offset   7e00000000   spacemap      0   free       8G
    metaslab     64   offset   8000000000   spacemap      0   free       8G
    metaslab     65   offset   8200000000   spacemap      0   free       8G
    metaslab     66   offset   8400000000   spacemap      0   free       8G
    metaslab     67   offset   8600000000   spacemap      0   free       8G
    metaslab     68   offset   8800000000   spacemap      0   free       8G
    metaslab     69   offset   8a00000000   spacemap      0   free       8G
    metaslab     70   offset   8c00000000   spacemap      0   free       8G
    metaslab     71   offset   8e00000000   spacemap      0   free       8G
    metaslab     72   offset   9000000000   spacemap      0   free       8G
    metaslab     73   offset   9200000000   spacemap      0   free       8G
    metaslab     74   offset   9400000000   spacemap      0   free       8G
    metaslab     75   offset   9600000000   spacemap      0   free       8G
    metaslab     76   offset   9800000000   spacemap      0   free       8G
    metaslab     77   offset   9a00000000   spacemap      0   free       8G
    metaslab     78   offset   9c00000000   spacemap      0   free       8G
    metaslab     79   offset   9e00000000   spacemap      0   free       8G
    metaslab     80   offset   a000000000   spacemap      0   free       8G
    metaslab     81   offset   a200000000   spacemap      0   free       8G
    metaslab     82   offset   a400000000   spacemap      0   free       8G
    metaslab     83   offset   a600000000   spacemap      0   free       8G
    metaslab     84   offset   a800000000   spacemap      0   free       8G
    metaslab     85   offset   aa00000000   spacemap      0   free       8G
    metaslab     86   offset   ac00000000   spacemap      0   free       8G
    metaslab     87   offset   ae00000000   spacemap      0   free       8G
    metaslab     88   offset   b000000000   spacemap      0   free       8G
    metaslab     89   offset   b200000000   spacemap      0   free       8G
    metaslab     90   offset   b400000000   spacemap      0   free       8G
    metaslab     91   offset   b600000000   spacemap      0   free       8G
    metaslab     92   offset   b800000000   spacemap      0   free       8G
    metaslab     93   offset   ba00000000   spacemap      0   free       8G
    metaslab     94   offset   bc00000000   spacemap      0   free       8G
    metaslab     95   offset   be00000000   spacemap      0   free       8G
    metaslab     96   offset   c000000000   spacemap      0   free       8G
    metaslab     97   offset   c200000000   spacemap      0   free       8G
    metaslab     98   offset   c400000000   spacemap      0   free       8G
    metaslab     99   offset   c600000000   spacemap      0   free       8G
    metaslab    100   offset   c800000000   spacemap      0   free       8G
    metaslab    101   offset   ca00000000   spacemap      0   free       8G
    metaslab    102   offset   cc00000000   spacemap      0   free       8G
    metaslab    103   offset   ce00000000   spacemap      0   free       8G
    metaslab    104   offset   d000000000   spacemap      0   free       8G
    metaslab    105   offset   d200000000   spacemap      0   free       8G
    metaslab    106   offset   d400000000   spacemap      0   free       8G
    metaslab    107   offset   d600000000   spacemap      0   free       8G
    metaslab    108   offset   d800000000   spacemap      0   free       8G
    metaslab    109   offset   da00000000   spacemap      0   free       8G
    metaslab    110   offset   dc00000000   spacemap      0   free       8G
    metaslab    111   offset   de00000000   spacemap      0   free       8G
    metaslab    112   offset   e000000000   spacemap      0   free       8G
    metaslab    113   offset   e200000000   spacemap      0   free       8G
    metaslab    114   offset   e400000000   spacemap      0   free       8G
    metaslab    115   offset   e600000000   spacemap      0   free       8G

space map refcount mismatch: expected 125 != actual 0
zdb -M rpool
    vdev          0     metaslabs  116      fragmentation  0%
    pool rpool  fragmentation     0%
space map refcount mismatch: expected 125 != actual 0

The pool however reports 41% frag, even though its on a single (dm-crypt) vdev:

zpool list -v rpool
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool   928G   631G   297G         -    41%    67%  1.00x  ONLINE  -
  root_crypt   928G   631G   297G         -    41%    67%

How does ZFS decide how many metaslabs to expect? Looks like i've got 116, not 0, or 125 here. Secondly, am i seeing a pool-level problem, or a dataset-level problem? I can wipe the DS with no issue - nothing important there, but the pool has other datasets which are rather vital (backed up to multi-vdev pools, not showing errors in those).

Scrub now reports an error in both the snapshot and the live DS, but older snapshots appear unaffected. I can try a rollback, but am concerned that if a snap was made with a working version of ZFS, i would start writing bad data to it with a potentially damaged ZFS runtime.

sempervictus avatar Feb 15 '15 20:02 sempervictus

ugh :-1:

zdb -M WD30EFRX vdev 0 metaslabs 170 fragmentation 0% pool WD30EFRX fragmentation 0% space map refcount mismatch: expected 167 != actual 0

just migrated to 3.19 and latest master

hopefully this isn't something to worry about ? :8ball:

currently a scrub is running - tomorrow (CET) I'll know more, I'm not using any snapshots or other fancy features beside mirroring right now; having 2 swap zvols on other partitions on the 2 mirror harddrives

that's all - I actually wanted to start doing some more intense usage and testing during the next days

luckily I did a grand backup (several backup volumes) before since data is vital [academic writing/dissertation, that's why I started to use ZFS in the first place] and what I noticed was that a subfolder had a changed timestamp (from today) which I - if I remember correctly - didn't touch at all

@sempervictus what hardware is this pool running on top ? dm-crypt/luks ? what kernel version ?

I should have stayed at 3.17.7 + state from december/november - anyway: fingers crossed

edit :

currently used ZFS state is https://github.com/kernelOfTruth/zfs/commits/zfs_kOT_14.02.2015 https://github.com/kernelOfTruth/spl/commits/spl_kOT_14.02.2015

so pretty recent

state used before was: https://github.com/kernelOfTruth/zfs/commits/zfs_kOT_13.01.2015 https://github.com/kernelOfTruth/spl/commits/spl_kOT_13.01.2015

to be honest I didn't use the zdb command before so I don't know if the error message from zdb was there before

zpool scrub did not show any errors so far and also all files accessed or used where working properly

kernelOfTruth avatar Feb 15 '15 22:02 kernelOfTruth

related:

https://bugs.freenas.org/issues/6352 https://forums.freenas.org/index.php?threads/problems-with-freenas-9-2-1-5.20879/

so it's not limited to ZFSonLinux, but also occurs on FreeBSD \ m0n0wall \ NanoBSD \ FreeNAS - based systems

kernelOfTruth avatar Feb 15 '15 22:02 kernelOfTruth

This is going from bad to worse. Attempting to copy data out of the dataset into another pool resulted in IO errors on read. This hard locked the host. I just had the backup host lockup on scrub (the one with multiple VDEVs hosting a pool receiving my bad snaps). That's not even the "worse" part, that's just the bad. The worse part is that "zfs rollback -r rpool/steam@20150209-0445" resulted in the following output from zpool status -v rpool:

errors: Permanent errors have been detected in the following files:

        <0x134a>:<0x0>
        rpool/steam:<0x0>
        <0x12ca>:<0x0>
        <0xffffffffffffffff>:<0x0>

I'm getting the sense that i either wipe the dataset or need to start copying out data from the datasets which are alive (hopefully not causing IO errors) and attempting to restore the pool from days ago on the backup host, presuming i didnt damage that one beyond repair by sending the snaps to it.

@kernelOfTruth: the second freenas bug suggests this to be a ZoL issue, the first does not tell us if ZoL has touched the pool. For all we know we might be borking everyone's data on disk and causing other distributions to suffer the spacemap problems we introduce. This setup is on 3.17.6 since we cant get Ubuntu 14 to boot a dm-crypt volume with our later kernels for our lives (never finds the volume to decrypt and never presents the askpass prompt, initrd seems unable to find the physical disk). The rpool is on an encrypted block device, and we all know ZFS prefers real disks, but since Oracle's legal department is too busy building their palaces in the depths of Tartarus to respond to inquiries about their labling the crypto code as CDDL, we have to encrypt our data somehow.

sempervictus avatar Feb 15 '15 23:02 sempervictus

I ran a scrub, which changed the output of zpool status -v to something even less pleassant:

zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0h53m with 0 errors on Sun Feb 15 19:18:23 2015
config:

    NAME          STATE     READ WRITE CKSUM
    rpool         ONLINE       0     0 1.32K
      root_crypt  ONLINE       0     0 5.26K

errors: Permanent errors have been detected in the following files:

        <0xffffffffffffffff>:<0x0>

Apparently rolling back the dataset was not the way to go here, since ZFS can't delete its own corrupt data, and said corruption just moves into the pool from the dataset. How do i know that ZFS can't delete the corrupt data? The freeing counter on rpool is stuck @ 3.47MB (about the size of the deltas rolled back), and i can't really imagine it being so computationally intensive to delete nodes from a tree as to take an hour to free 3MB worth of nodes.

Given that there's critical data in the pool, and the error has propagated into tier-1 of the backups (meaning to recover the pool from tier-2 i lose two weeks of data), i'm a bit more concerned about saving the pool than finding the bug. Is there a manual way to drop the bad records after the last known good snapshot? Or manually edit the vdev pruning the garbage records into "empty space?"

Here's the part which should probably put the brakes on 0.6.4 until we know what is happening here: After testing my own pools, i discovered that we have multiple pools in our datacenter which are NOT showing any scrub errors, reporting mismatches in the space map using zdb -m. These are hosts with different builds and patch stacks, but all of them post kmem rework - the thinnest ones only run 1099 and the pool GUID patch. Upstream packages for CentOS 6 do NOT contain whatever is causing this to occur, those pools are clean.

The damaged pools are pushing off to a FreeNAS (8.1 on x86) as the final tier of their backup, but it appears that this problem propagates right through snapshots and across platforms once introduced according to posts @kernelOfTruth linked. This freenas instance here has no spacemap histogram feature, its literally been running in our DC for several years untouched on FreeNAS 8.1 in a deep pit of network isolation, and ZFS is old (given that it suffers running on x86 with 2G of RAM). Despite the source pools reporting the mismatch - the FreeNAS host does NOT. So this problem may be cleaned up, by "filtering" it through an older ZFS send/recv cycle, presuming you're not in my position yet where you have something looking like dangling pointers in the pool itself.

@everyone: to test if you're affected, the following will output every pool name, and a line underneath either indicating you have a mismatch, or blank if you're fine:

zpool list |grep -v NAME | cut -d' ' -f1 | while read POOL; do echo $POOL; echo $(zdb -m $POOL | grep mismatch); done

sempervictus avatar Feb 16 '15 00:02 sempervictus

Finally some good news, or lack of it getting worse anyway. After the last scrub i rebooted the host in the hopes that an import will actually free the "freeing" amount. No dice there - still have 3.47M showing as freeing. However, the pool is importable and i appear to have access to my data. ZDB has taken a slightly better view of rpool, still shows a mismatch, but at least not it's not claiming there are no spacemaps:

zdb -m rpool

Metaslabs:
    vdev          0
    metaslabs   116   offset                spacemap          free      
    ---------------   -------------------   ---------------   -------------
    metaslab      0   offset            0   spacemap  24805   free    1.44G
    metaslab      1   offset    200000000   spacemap     79   free    2.04G
    metaslab      2   offset    400000000   spacemap     78   free     338M
    metaslab      3   offset    600000000   spacemap     77   free    4.02G
    metaslab      4   offset    800000000   spacemap  24807   free    4.34G
    metaslab      5   offset    a00000000   spacemap  24834   free    2.27G
    metaslab      6   offset    c00000000   spacemap  24883   free    1.36G
    metaslab      7   offset    e00000000   spacemap  24818   free     884M
    metaslab      8   offset   1000000000   spacemap  24819   free    3.11G
    metaslab      9   offset   1200000000   spacemap     58   free     249M
    metaslab     10   offset   1400000000   spacemap  24865   free    3.28G
    metaslab     11   offset   1600000000   spacemap  24832   free    2.69G
    metaslab     12   offset   1800000000   spacemap  24828   free    3.92G
    metaslab     13   offset   1a00000000   spacemap  24881   free    4.81G
    metaslab     14   offset   1c00000000   spacemap  24861   free    1.44G
    metaslab     15   offset   1e00000000   spacemap  24851   free     437M
    metaslab     16   offset   2000000000   spacemap  24894   free    2.48G
    metaslab     17   offset   2200000000   spacemap     45   free    1.49G
    metaslab     18   offset   2400000000   spacemap  24891   free    4.51G
    metaslab     19   offset   2600000000   spacemap  24892   free     686M
    metaslab     20   offset   2800000000   spacemap  24871   free    1.33G
    metaslab     21   offset   2a00000000   spacemap  24843   free    1.67G
    metaslab     22   offset   2c00000000   spacemap  24797   free    3.75G
    metaslab     23   offset   2e00000000   spacemap  24893   free    2.94G
    metaslab     24   offset   3000000000   spacemap  24833   free    2.74G
    metaslab     25   offset   3200000000   spacemap  24884   free    2.06G
    metaslab     26   offset   3400000000   spacemap  24887   free     908M
    metaslab     27   offset   3600000000   spacemap  24792   free    3.38G
    metaslab     28   offset   3800000000   spacemap  24794   free    4.17G
    metaslab     29   offset   3a00000000   spacemap  24810   free    1.07G
    metaslab     30   offset   3c00000000   spacemap  24803   free    2.90G
    metaslab     31   offset   3e00000000   spacemap  24829   free     578M
    metaslab     32   offset   4000000000   spacemap  24856   free    1.45G
    metaslab     33   offset   4200000000   spacemap  24813   free    2.16G
    metaslab     34   offset   4400000000   spacemap  24839   free    1.06G
    metaslab     35   offset   4600000000   spacemap  24800   free     573M
    metaslab     36   offset   4800000000   spacemap  24868   free    2.91G
    metaslab     37   offset   4a00000000   spacemap  24838   free    4.62G
    metaslab     38   offset   4c00000000   spacemap  24882   free     483M
    metaslab     39   offset   4e00000000   spacemap     44   free    2.77G
    metaslab     40   offset   5000000000   spacemap  24860   free     803M
    metaslab     41   offset   5200000000   spacemap  24842   free    1.45G
    metaslab     42   offset   5400000000   spacemap     43   free    3.31G
    metaslab     43   offset   5600000000   spacemap  24812   free     321M
    metaslab     44   offset   5800000000   spacemap  24796   free    5.79G
    metaslab     45   offset   5a00000000   spacemap  24809   free    1.20G
    metaslab     46   offset   5c00000000   spacemap  24880   free    1.18G
    metaslab     47   offset   5e00000000   spacemap  24848   free     863M
    metaslab     48   offset   6000000000   spacemap  24804   free     668M
    metaslab     49   offset   6200000000   spacemap  24816   free     536M
    metaslab     50   offset   6400000000   spacemap  24821   free    1.84G
    metaslab     51   offset   6600000000   spacemap  24845   free     747M
    metaslab     52   offset   6800000000   spacemap  24811   free     125M
    metaslab     53   offset   6a00000000   spacemap  24837   free    5.50G
    metaslab     54   offset   6c00000000   spacemap  24870   free    4.21G
    metaslab     55   offset   6e00000000   spacemap  24852   free    52.9M
    metaslab     56   offset   7000000000   spacemap  24855   free    1.19G
    metaslab     57   offset   7200000000   spacemap  24872   free    1006M
    metaslab     58   offset   7400000000   spacemap  24841   free     505M
    metaslab     59   offset   7600000000   spacemap     40   free    63.8M
    metaslab     60   offset   7800000000   spacemap     38   free    2.12G
    metaslab     61   offset   7a00000000   spacemap  24840   free     513M
    metaslab     62   offset   7c00000000   spacemap  24853   free    3.29G
    metaslab     63   offset   7e00000000   spacemap  24815   free    1007M
    metaslab     64   offset   8000000000   spacemap  24846   free     707M
    metaslab     65   offset   8200000000   spacemap     37   free    4.00G
    metaslab     66   offset   8400000000   spacemap  24808   free     762M
    metaslab     67   offset   8600000000   spacemap  24835   free     321M
    metaslab     68   offset   8800000000   spacemap  24857   free    3.97G
    metaslab     69   offset   8a00000000   spacemap  24836   free     969M
    metaslab     70   offset   8c00000000   spacemap  24878   free    41.7M
    metaslab     71   offset   8e00000000   spacemap  24847   free    5.50G
    metaslab     72   offset   9000000000   spacemap  24795   free    3.26G
    metaslab     73   offset   9200000000   spacemap  24867   free     501M
    metaslab     74   offset   9400000000   spacemap  24849   free    1.37G
    metaslab     75   offset   9600000000   spacemap  24879   free    1.17G
    metaslab     76   offset   9800000000   spacemap  24822   free     646M
    metaslab     77   offset   9a00000000   spacemap  24844   free    47.9M
    metaslab     78   offset   9c00000000   spacemap  24859   free     247M
    metaslab     79   offset   9e00000000   spacemap  24831   free    3.33G
    metaslab     80   offset   a000000000   spacemap  24854   free    5.13G
    metaslab     81   offset   a200000000   spacemap     36   free     241M
    metaslab     82   offset   a400000000   spacemap  24806   free    62.2M
    metaslab     83   offset   a600000000   spacemap  24799   free    1.71G
    metaslab     84   offset   a800000000   spacemap  24801   free    2.54G
    metaslab     85   offset   aa00000000   spacemap  24825   free    2.09G
    metaslab     86   offset   ac00000000   spacemap  24817   free     832M
    metaslab     87   offset   ae00000000   spacemap  24826   free    2.22G
    metaslab     88   offset   b000000000   spacemap  24823   free    1.49G
    metaslab     89   offset   b200000000   spacemap  24866   free    8.21M
    metaslab     90   offset   b400000000   spacemap  24873   free    5.07G
    metaslab     91   offset   b600000000   spacemap  24858   free    5.05G
    metaslab     92   offset   b800000000   spacemap  24877   free    5.03G
    metaslab     93   offset   ba00000000   spacemap  24850   free    5.95G
    metaslab     94   offset   bc00000000   spacemap  24885   free    4.96G
    metaslab     95   offset   be00000000   spacemap  24886   free    3.79G
    metaslab     96   offset   c000000000   spacemap  24798   free    4.63G
    metaslab     97   offset   c200000000   spacemap  24830   free    6.43G
    metaslab     98   offset   c400000000   spacemap  24820   free    1.50G
    metaslab     99   offset   c600000000   spacemap  24888   free    3.60G
    metaslab    100   offset   c800000000   spacemap     35   free    3.08G
    metaslab    101   offset   ca00000000   spacemap  24874   free    3.35G
    metaslab    102   offset   cc00000000   spacemap  24875   free    7.26G
    metaslab    103   offset   ce00000000   spacemap  24876   free    1.38G
    metaslab    104   offset   d000000000   spacemap  24889   free    6.79G
    metaslab    105   offset   d200000000   spacemap  24890   free    4.05G
    metaslab    106   offset   d400000000   spacemap  24869   free    6.82G
    metaslab    107   offset   d600000000   spacemap  24824   free    7.78G
    metaslab    108   offset   d800000000   spacemap  24862   free     700M
    metaslab    109   offset   da00000000   spacemap  24863   free    7.74G
    metaslab    110   offset   dc00000000   spacemap  24864   free    7.00G
    metaslab    111   offset   de00000000   spacemap  24814   free    7.65G
    metaslab    112   offset   e000000000   spacemap  24793   free    7.56G
    metaslab    113   offset   e200000000   spacemap  24791   free    4.26G
    metaslab    114   offset   e400000000   spacemap  24827   free    2.34G
    metaslab    115   offset   e600000000   spacemap  24802   free    4.64G

space map refcount mismatch: expected 127 != actual 117

On the other hand, seems every IO is causing cksum errors, probably because the spacemaps dont align. The pool is sha256, single-disk VDEV, and somehow showing different numbers of cksum errors from the vdev and the pool:

    NAME          STATE     READ WRITE CKSUM
    rpool         ONLINE       0     0   467
      root_crypt  ONLINE       0     0 1.82K

Smartctl reports on errors, and i'm not seeing any data-level problems, nor am i seeing these errors on the other pool (also going through a scrub presently).

I tried checksumming the metadata with zdb, and got an assertion failure which may actually help track whats happening:

zdb -mc rpool

Metaslabs:
    vdev          0
    metaslabs   116   offset                spacemap          free      
    ---------------   -------------------   ---------------   -------------
    metaslab      0   offset            0   spacemap  24805   free    1.44G
    metaslab      1   offset    200000000   spacemap     79   free    2.04G
    metaslab      2   offset    400000000   spacemap     78   free     338M
    metaslab      3   offset    600000000   spacemap     77   free    4.02G
    metaslab      4   offset    800000000   spacemap  24807   free    4.34G
    metaslab      5   offset    a00000000   spacemap  24834   free    2.27G
    metaslab      6   offset    c00000000   spacemap  24883   free    1.36G
    metaslab      7   offset    e00000000   spacemap  24818   free     884M
    metaslab      8   offset   1000000000   spacemap  24819   free    3.11G
    metaslab      9   offset   1200000000   spacemap     58   free     249M
    metaslab     10   offset   1400000000   spacemap  24865   free    3.28G
    metaslab     11   offset   1600000000   spacemap  24832   free    2.69G
    metaslab     12   offset   1800000000   spacemap  24828   free    3.92G
    metaslab     13   offset   1a00000000   spacemap  24881   free    4.81G
    metaslab     14   offset   1c00000000   spacemap  24861   free    1.44G
    metaslab     15   offset   1e00000000   spacemap  24851   free     437M
    metaslab     16   offset   2000000000   spacemap  24894   free    2.48G
    metaslab     17   offset   2200000000   spacemap     45   free    1.49G
    metaslab     18   offset   2400000000   spacemap  24891   free    4.51G
    metaslab     19   offset   2600000000   spacemap  24892   free     686M
    metaslab     20   offset   2800000000   spacemap  24871   free    1.33G
    metaslab     21   offset   2a00000000   spacemap  24843   free    1.67G
    metaslab     22   offset   2c00000000   spacemap  24797   free    3.75G
    metaslab     23   offset   2e00000000   spacemap  24893   free    2.94G
    metaslab     24   offset   3000000000   spacemap  24833   free    2.74G
    metaslab     25   offset   3200000000   spacemap  24884   free    2.06G
    metaslab     26   offset   3400000000   spacemap  24887   free     908M
    metaslab     27   offset   3600000000   spacemap  24792   free    3.38G
    metaslab     28   offset   3800000000   spacemap  24794   free    4.17G
    metaslab     29   offset   3a00000000   spacemap  24810   free    1.07G
    metaslab     30   offset   3c00000000   spacemap  24803   free    2.90G
    metaslab     31   offset   3e00000000   spacemap  24829   free     578M
    metaslab     32   offset   4000000000   spacemap  24856   free    1.45G
    metaslab     33   offset   4200000000   spacemap  24813   free    2.16G
    metaslab     34   offset   4400000000   spacemap  24839   free    1.06G
    metaslab     35   offset   4600000000   spacemap  24800   free     573M
    metaslab     36   offset   4800000000   spacemap  24868   free    2.91G
    metaslab     37   offset   4a00000000   spacemap  24838   free    4.62G
    metaslab     38   offset   4c00000000   spacemap  24882   free     483M
    metaslab     39   offset   4e00000000   spacemap     44   free    2.77G
    metaslab     40   offset   5000000000   spacemap  24860   free     803M
    metaslab     41   offset   5200000000   spacemap  24842   free    1.45G
    metaslab     42   offset   5400000000   spacemap     43   free    3.31G
    metaslab     43   offset   5600000000   spacemap  24812   free     321M
    metaslab     44   offset   5800000000   spacemap  24796   free    5.79G
    metaslab     45   offset   5a00000000   spacemap  24809   free    1.20G
    metaslab     46   offset   5c00000000   spacemap  24880   free    1.18G
    metaslab     47   offset   5e00000000   spacemap  24848   free     863M
    metaslab     48   offset   6000000000   spacemap  24804   free     668M
    metaslab     49   offset   6200000000   spacemap  24816   free     536M
    metaslab     50   offset   6400000000   spacemap  24821   free    1.84G
    metaslab     51   offset   6600000000   spacemap  24845   free     747M
    metaslab     52   offset   6800000000   spacemap  24811   free     125M
    metaslab     53   offset   6a00000000   spacemap  24837   free    5.50G
    metaslab     54   offset   6c00000000   spacemap  24870   free    4.21G
    metaslab     55   offset   6e00000000   spacemap  24852   free    52.9M
    metaslab     56   offset   7000000000   spacemap  24855   free    1.19G
    metaslab     57   offset   7200000000   spacemap  24872   free    1006M
    metaslab     58   offset   7400000000   spacemap  24841   free     505M
    metaslab     59   offset   7600000000   spacemap     40   free    63.8M
    metaslab     60   offset   7800000000   spacemap     38   free    2.12G
    metaslab     61   offset   7a00000000   spacemap  24840   free     513M
    metaslab     62   offset   7c00000000   spacemap  24853   free    3.29G
    metaslab     63   offset   7e00000000   spacemap  24815   free    1007M
    metaslab     64   offset   8000000000   spacemap  24846   free     707M
    metaslab     65   offset   8200000000   spacemap     37   free    4.00G
    metaslab     66   offset   8400000000   spacemap  24808   free     762M
    metaslab     67   offset   8600000000   spacemap  24835   free     321M
    metaslab     68   offset   8800000000   spacemap  24857   free    3.97G
    metaslab     69   offset   8a00000000   spacemap  24836   free     969M
    metaslab     70   offset   8c00000000   spacemap  24878   free    41.7M
    metaslab     71   offset   8e00000000   spacemap  24847   free    5.50G
    metaslab     72   offset   9000000000   spacemap  24795   free    3.26G
    metaslab     73   offset   9200000000   spacemap  24867   free     501M
    metaslab     74   offset   9400000000   spacemap  24849   free    1.37G
    metaslab     75   offset   9600000000   spacemap  24879   free    1.17G
    metaslab     76   offset   9800000000   spacemap  24822   free     646M
    metaslab     77   offset   9a00000000   spacemap  24844   free    47.9M
    metaslab     78   offset   9c00000000   spacemap  24859   free     247M
    metaslab     79   offset   9e00000000   spacemap  24831   free    3.33G
    metaslab     80   offset   a000000000   spacemap  24854   free    5.13G
    metaslab     81   offset   a200000000   spacemap     36   free     241M
    metaslab     82   offset   a400000000   spacemap  24806   free    62.2M
    metaslab     83   offset   a600000000   spacemap  24799   free    1.71G
    metaslab     84   offset   a800000000   spacemap  24801   free    2.54G
    metaslab     85   offset   aa00000000   spacemap  24825   free    2.09G
    metaslab     86   offset   ac00000000   spacemap  24817   free     832M
    metaslab     87   offset   ae00000000   spacemap  24826   free    2.22G
    metaslab     88   offset   b000000000   spacemap  24823   free    1.49G
    metaslab     89   offset   b200000000   spacemap  24866   free    8.21M
    metaslab     90   offset   b400000000   spacemap  24873   free    5.07G
    metaslab     91   offset   b600000000   spacemap  24858   free    5.05G
    metaslab     92   offset   b800000000   spacemap  24877   free    5.03G
    metaslab     93   offset   ba00000000   spacemap  24850   free    5.95G
    metaslab     94   offset   bc00000000   spacemap  24885   free    4.96G
    metaslab     95   offset   be00000000   spacemap  24886   free    3.79G
    metaslab     96   offset   c000000000   spacemap  24798   free    4.63G
    metaslab     97   offset   c200000000   spacemap  24830   free    6.43G
    metaslab     98   offset   c400000000   spacemap  24820   free    1.50G
    metaslab     99   offset   c600000000   spacemap  24888   free    3.60G
    metaslab    100   offset   c800000000   spacemap     35   free    3.08G
    metaslab    101   offset   ca00000000   spacemap  24874   free    3.35G
    metaslab    102   offset   cc00000000   spacemap  24875   free    7.26G
    metaslab    103   offset   ce00000000   spacemap  24876   free    1.38G
    metaslab    104   offset   d000000000   spacemap  24889   free    6.79G
    metaslab    105   offset   d200000000   spacemap  24890   free    4.05G
    metaslab    106   offset   d400000000   spacemap  24869   free    6.82G
    metaslab    107   offset   d600000000   spacemap  24824   free    7.78G
    metaslab    108   offset   d800000000   spacemap  24862   free     700M
    metaslab    109   offset   da00000000   spacemap  24863   free    7.74G
    metaslab    110   offset   dc00000000   spacemap  24864   free    7.00G
    metaslab    111   offset   de00000000   spacemap  24814   free    7.65G
    metaslab    112   offset   e000000000   spacemap  24793   free    7.56G
    metaslab    113   offset   e200000000   spacemap  24791   free    4.26G
    metaslab    114   offset   e400000000   spacemap  24827   free    2.34G
    metaslab    115   offset   e600000000   spacemap  24802   free    4.64G


Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 115 of 116 ...
zdb: ../../cmd/zdb/zdb.c:2618: Assertion `0 == bptree_iterate(spa->spa_meta_objset, spa->spa_dsl_pool->dp_bptree_obj, B_FALSE, count_block_cb, &zcb, ((void *)0)) (0x0 == 0x34)' failed

EDIT: that assertion led me to issue 4390 in Illumos, which is merged into our tree, but may be a starting point for those who have a better clue than i as to the cause and fix here. Possibly another argument for an offline fsck implementation - fixing spacemap and metadata issues which online pools seem unable to resolve.

sempervictus avatar Feb 16 '15 02:02 sempervictus

luckily it finished without errors here:

zpool status -v WD30EFRX
  pool: WD30EFRX
 state: ONLINE
  scan: scrub repaired 0 in 6h14m with 0 errors on Mon Feb 16 03:40:04 2015
config:

    NAME              STATE     READ WRITE CKSUM
    WD30EFRX          ONLINE       0     0     0
      mirror-0        ONLINE       0     0     0
        wd30efrx_002  ONLINE       0     0     0
        wd30efrx      ONLINE       0     0     0
    cache
      intelSSD180     ONLINE       0     0     0
zpool list |grep -v NAME | cut -d' ' -f1 | while read POOL; do echo $POOL; echo $(zdb -m $POOL | grep mismatch); done
SWAP2_1
space map refcount mismatch: expected 3 != actual 0
SWAP2_2
space map refcount mismatch: expected 3 != actual 0
WD30EFRX
space map refcount mismatch: expected 169 != actual 0

note that SWAP2_1 & SWAP2_2 are only several hours old

so this issue might be introduced right from the start or when calling zdb

all volumes are on cryptsetup/luks encrypted partitions or harddrives

I'm running probably a different version of cryptsetup from you, (compared to Ubuntu 14.04)

cryptsetup --version
cryptsetup 1.6.6

so there might be factors involved from userland, too

zdb -m WD30EFRX

Metaslabs:
    vdev          0
    metaslabs   170   offset                spacemap          free      
    ---------------   -------------------   ---------------   -------------
    metaslab      0   offset            0   spacemap      0   free      16G
    metaslab      1   offset    400000000   spacemap      0   free      16G
    metaslab      2   offset    800000000   spacemap      0   free      16G
    metaslab      3   offset    c00000000   spacemap      0   free      16G
    metaslab      4   offset   1000000000   spacemap      0   free      16G
    metaslab      5   offset   1400000000   spacemap      0   free      16G
    metaslab      6   offset   1800000000   spacemap      0   free      16G
    metaslab      7   offset   1c00000000   spacemap      0   free      16G
    metaslab      8   offset   2000000000   spacemap      0   free      16G
    metaslab      9   offset   2400000000   spacemap      0   free      16G
    metaslab     10   offset   2800000000   spacemap      0   free      16G
    metaslab     11   offset   2c00000000   spacemap      0   free      16G
    metaslab     12   offset   3000000000   spacemap      0   free      16G
    metaslab     13   offset   3400000000   spacemap      0   free      16G
    metaslab     14   offset   3800000000   spacemap      0   free      16G
    metaslab     15   offset   3c00000000   spacemap      0   free      16G
    metaslab     16   offset   4000000000   spacemap      0   free      16G
    metaslab     17   offset   4400000000   spacemap      0   free      16G
    metaslab     18   offset   4800000000   spacemap      0   free      16G
    metaslab     19   offset   4c00000000   spacemap      0   free      16G
    metaslab     20   offset   5000000000   spacemap      0   free      16G
    metaslab     21   offset   5400000000   spacemap      0   free      16G
    metaslab     22   offset   5800000000   spacemap      0   free      16G
    metaslab     23   offset   5c00000000   spacemap      0   free      16G
    metaslab     24   offset   6000000000   spacemap      0   free      16G
    metaslab     25   offset   6400000000   spacemap      0   free      16G
    metaslab     26   offset   6800000000   spacemap      0   free      16G
    metaslab     27   offset   6c00000000   spacemap      0   free      16G
    metaslab     28   offset   7000000000   spacemap      0   free      16G
    metaslab     29   offset   7400000000   spacemap      0   free      16G
    metaslab     30   offset   7800000000   spacemap      0   free      16G
    metaslab     31   offset   7c00000000   spacemap      0   free      16G
    metaslab     32   offset   8000000000   spacemap      0   free      16G
    metaslab     33   offset   8400000000   spacemap      0   free      16G
    metaslab     34   offset   8800000000   spacemap      0   free      16G
    metaslab     35   offset   8c00000000   spacemap      0   free      16G
    metaslab     36   offset   9000000000   spacemap      0   free      16G
    metaslab     37   offset   9400000000   spacemap      0   free      16G
    metaslab     38   offset   9800000000   spacemap      0   free      16G
    metaslab     39   offset   9c00000000   spacemap      0   free      16G
    metaslab     40   offset   a000000000   spacemap      0   free      16G
    metaslab     41   offset   a400000000   spacemap      0   free      16G
    metaslab     42   offset   a800000000   spacemap      0   free      16G
    metaslab     43   offset   ac00000000   spacemap      0   free      16G
    metaslab     44   offset   b000000000   spacemap      0   free      16G
    metaslab     45   offset   b400000000   spacemap      0   free      16G
    metaslab     46   offset   b800000000   spacemap      0   free      16G
    metaslab     47   offset   bc00000000   spacemap      0   free      16G
    metaslab     48   offset   c000000000   spacemap      0   free      16G
    metaslab     49   offset   c400000000   spacemap      0   free      16G
    metaslab     50   offset   c800000000   spacemap      0   free      16G
    metaslab     51   offset   cc00000000   spacemap      0   free      16G
    metaslab     52   offset   d000000000   spacemap      0   free      16G
    metaslab     53   offset   d400000000   spacemap      0   free      16G
    metaslab     54   offset   d800000000   spacemap      0   free      16G
    metaslab     55   offset   dc00000000   spacemap      0   free      16G
    metaslab     56   offset   e000000000   spacemap      0   free      16G
    metaslab     57   offset   e400000000   spacemap      0   free      16G
    metaslab     58   offset   e800000000   spacemap      0   free      16G
    metaslab     59   offset   ec00000000   spacemap      0   free      16G
    metaslab     60   offset   f000000000   spacemap      0   free      16G
    metaslab     61   offset   f400000000   spacemap      0   free      16G
    metaslab     62   offset   f800000000   spacemap      0   free      16G
    metaslab     63   offset   fc00000000   spacemap      0   free      16G
    metaslab     64   offset  10000000000   spacemap      0   free      16G
    metaslab     65   offset  10400000000   spacemap      0   free      16G
    metaslab     66   offset  10800000000   spacemap      0   free      16G
    metaslab     67   offset  10c00000000   spacemap      0   free      16G
    metaslab     68   offset  11000000000   spacemap      0   free      16G
    metaslab     69   offset  11400000000   spacemap      0   free      16G
    metaslab     70   offset  11800000000   spacemap      0   free      16G
    metaslab     71   offset  11c00000000   spacemap      0   free      16G
    metaslab     72   offset  12000000000   spacemap      0   free      16G
    metaslab     73   offset  12400000000   spacemap      0   free      16G
    metaslab     74   offset  12800000000   spacemap      0   free      16G
    metaslab     75   offset  12c00000000   spacemap      0   free      16G
    metaslab     76   offset  13000000000   spacemap      0   free      16G
    metaslab     77   offset  13400000000   spacemap      0   free      16G
    metaslab     78   offset  13800000000   spacemap      0   free      16G
    metaslab     79   offset  13c00000000   spacemap      0   free      16G
    metaslab     80   offset  14000000000   spacemap      0   free      16G
    metaslab     81   offset  14400000000   spacemap      0   free      16G
    metaslab     82   offset  14800000000   spacemap      0   free      16G
    metaslab     83   offset  14c00000000   spacemap      0   free      16G
    metaslab     84   offset  15000000000   spacemap      0   free      16G
    metaslab     85   offset  15400000000   spacemap      0   free      16G
    metaslab     86   offset  15800000000   spacemap      0   free      16G
    metaslab     87   offset  15c00000000   spacemap      0   free      16G
    metaslab     88   offset  16000000000   spacemap      0   free      16G
    metaslab     89   offset  16400000000   spacemap      0   free      16G
    metaslab     90   offset  16800000000   spacemap      0   free      16G
    metaslab     91   offset  16c00000000   spacemap      0   free      16G
    metaslab     92   offset  17000000000   spacemap      0   free      16G
    metaslab     93   offset  17400000000   spacemap      0   free      16G
    metaslab     94   offset  17800000000   spacemap      0   free      16G
    metaslab     95   offset  17c00000000   spacemap      0   free      16G
    metaslab     96   offset  18000000000   spacemap      0   free      16G
    metaslab     97   offset  18400000000   spacemap      0   free      16G
    metaslab     98   offset  18800000000   spacemap      0   free      16G
    metaslab     99   offset  18c00000000   spacemap      0   free      16G
    metaslab    100   offset  19000000000   spacemap      0   free      16G
    metaslab    101   offset  19400000000   spacemap      0   free      16G
    metaslab    102   offset  19800000000   spacemap      0   free      16G
    metaslab    103   offset  19c00000000   spacemap      0   free      16G
    metaslab    104   offset  1a000000000   spacemap      0   free      16G
    metaslab    105   offset  1a400000000   spacemap      0   free      16G
    metaslab    106   offset  1a800000000   spacemap      0   free      16G
    metaslab    107   offset  1ac00000000   spacemap      0   free      16G
    metaslab    108   offset  1b000000000   spacemap      0   free      16G
    metaslab    109   offset  1b400000000   spacemap      0   free      16G
    metaslab    110   offset  1b800000000   spacemap      0   free      16G
    metaslab    111   offset  1bc00000000   spacemap      0   free      16G
    metaslab    112   offset  1c000000000   spacemap      0   free      16G
    metaslab    113   offset  1c400000000   spacemap      0   free      16G
    metaslab    114   offset  1c800000000   spacemap      0   free      16G
    metaslab    115   offset  1cc00000000   spacemap      0   free      16G
    metaslab    116   offset  1d000000000   spacemap      0   free      16G
    metaslab    117   offset  1d400000000   spacemap      0   free      16G
    metaslab    118   offset  1d800000000   spacemap      0   free      16G
    metaslab    119   offset  1dc00000000   spacemap      0   free      16G
    metaslab    120   offset  1e000000000   spacemap      0   free      16G
    metaslab    121   offset  1e400000000   spacemap      0   free      16G
    metaslab    122   offset  1e800000000   spacemap      0   free      16G
    metaslab    123   offset  1ec00000000   spacemap      0   free      16G
    metaslab    124   offset  1f000000000   spacemap      0   free      16G
    metaslab    125   offset  1f400000000   spacemap      0   free      16G
    metaslab    126   offset  1f800000000   spacemap      0   free      16G
    metaslab    127   offset  1fc00000000   spacemap      0   free      16G
    metaslab    128   offset  20000000000   spacemap      0   free      16G
    metaslab    129   offset  20400000000   spacemap      0   free      16G
    metaslab    130   offset  20800000000   spacemap      0   free      16G
    metaslab    131   offset  20c00000000   spacemap      0   free      16G
    metaslab    132   offset  21000000000   spacemap      0   free      16G
    metaslab    133   offset  21400000000   spacemap      0   free      16G
    metaslab    134   offset  21800000000   spacemap      0   free      16G
    metaslab    135   offset  21c00000000   spacemap      0   free      16G
    metaslab    136   offset  22000000000   spacemap      0   free      16G
    metaslab    137   offset  22400000000   spacemap      0   free      16G
    metaslab    138   offset  22800000000   spacemap      0   free      16G
    metaslab    139   offset  22c00000000   spacemap      0   free      16G
    metaslab    140   offset  23000000000   spacemap      0   free      16G
    metaslab    141   offset  23400000000   spacemap      0   free      16G
    metaslab    142   offset  23800000000   spacemap      0   free      16G
    metaslab    143   offset  23c00000000   spacemap      0   free      16G
    metaslab    144   offset  24000000000   spacemap      0   free      16G
    metaslab    145   offset  24400000000   spacemap      0   free      16G
    metaslab    146   offset  24800000000   spacemap      0   free      16G
    metaslab    147   offset  24c00000000   spacemap      0   free      16G
    metaslab    148   offset  25000000000   spacemap      0   free      16G
    metaslab    149   offset  25400000000   spacemap      0   free      16G
    metaslab    150   offset  25800000000   spacemap      0   free      16G
    metaslab    151   offset  25c00000000   spacemap      0   free      16G
    metaslab    152   offset  26000000000   spacemap      0   free      16G
    metaslab    153   offset  26400000000   spacemap      0   free      16G
    metaslab    154   offset  26800000000   spacemap      0   free      16G
    metaslab    155   offset  26c00000000   spacemap      0   free      16G
    metaslab    156   offset  27000000000   spacemap      0   free      16G
    metaslab    157   offset  27400000000   spacemap      0   free      16G
    metaslab    158   offset  27800000000   spacemap      0   free      16G
    metaslab    159   offset  27c00000000   spacemap      0   free      16G
    metaslab    160   offset  28000000000   spacemap      0   free      16G
    metaslab    161   offset  28400000000   spacemap      0   free      16G
    metaslab    162   offset  28800000000   spacemap      0   free      16G
    metaslab    163   offset  28c00000000   spacemap      0   free      16G
    metaslab    164   offset  29000000000   spacemap      0   free      16G
    metaslab    165   offset  29400000000   spacemap      0   free      16G
    metaslab    166   offset  29800000000   spacemap      0   free      16G
    metaslab    167   offset  29c00000000   spacemap      0   free      16G
    metaslab    168   offset  2a000000000   spacemap      0   free      16G
    metaslab    169   offset  2a400000000   spacemap      0   free      16G

space map refcount mismatch: expected 169 != actual 0

this nevertheless leaves an uneasy feeling of what is actually happening here

strangely github's comment system doesn't let me address @sempervictus directly (isn't shown in the list)

the only additional patch I'm running compared to your patch stack (the only additional patch and/or feature I'm using is #2351 ) - not sure if that could make a difference, either for the worse or better

the others were merged meanwhile

  * origin/pr/3078 [merged]
  * origin/pr/3074 [merged]
  * origin/pr/2970 [merged]
* origin/pr/2909
* origin/pr/2890
* origin/pr/2820
* origin/pr/2784
* origin/pr/2012
  * origin/pr/1099 (iSCSI support)
  * origin/pr/2685 [merged]
  * origin/pr/3063 [merged]
  * origin/pr/3076 [merged]
* origin/pr/2129
* origin/pr/2351

I'm NOT using iSCSI here, or at all ...

kernelOfTruth avatar Feb 16 '15 05:02 kernelOfTruth

I think the grievous error was introduced when i stupidly tried a rollback on a broken dataset. Wonder if even the gurus know how to fix this.

Far as the patch stack - @behlendorf merged most of mine, and i'm running a pretty thin one right now consisting of:

  * origin/pr/2820
  ** Set zfs_autoimport_disable default value to 1
  * origin/pr/2012
  ** Add option to zpool status to print guids
  * origin/pr/1099
  ** Misc iSCSI fixes/cleanups/attempts to get 'zfs rename' to work.
  * origin/pr/2129
  ** Enable ABD
  * master @ 7d2868d5fc88a4c30769b44f56a3a88a4277a9ab

  * Removed:
  ** origin/pr/2784
  ** origin/pr/2909

The two i omitted due to potential for causing this are listed as well, and i think they did have an impact as i now see spacemaps, just not the right number (before we were seeing everything in SM 0).

I havent tried #2351 in a while - caused some serious problems for me (hard lock on AVX system) when i first tried it, and i've stayed away since. Might be useful since my rpool is sha256 and returning lots of cksum problems since this error bubbled up (highly doubt its a coincidence).

The fact that i'm seeing obnoxious amounts of cksum errors on my pool, and more on its only vdev (granted i'm scrubbing half a T), without any application layer data issues or dmesg complaints is confusing. I'm still waiting on a smartctl --long test to complete, but i see no issues so far (10% remaining, these take forever).

Scrubs are coming back with 0 errors, but the pool retains its one "epic fail" and the 3.74MB of data associated with it. Luckily the other pool on the same host is totally fine, so thats almost 2T i wont have to restore.

sempervictus avatar Feb 16 '15 07:02 sempervictus

@kernelOfTruth i just noticed your spacemap was also zero, and you're still running the two PRs i removed. #2909 looks more likely the culprit, and i've lodged a question asking if we're sure that spa_writeable(vd->vdev_spa) will return true, because my guess is that we're actually running live pools without real spacemaps when using the PR in some situations.

Here's what i think happened: #2909 allows us to import vdevs without a space map. We write, and screw up the map by updating the wrong data structure. You're still running the PR, so zdb -m returns SM 0 for everything. I've removed the PR, so i'm seeing a mismatch between the original + current number of space maps, but missing every entry and SM created while running the PR. This may also explain why i cant bloody destroy those few MB, the zdb assertion i'm seeing is probably caused by this mismatch as well, and my cksum flood seems likely to be related to either the space maps being loaded, or the jacked up data i've written since.

The PR looked safe at first review, but i'm not nearly l33t enough to picture the execution context/stack properly here, and am afraid that without changes to how scrubs are handled (recalculating the space maps), my pool may be FUBAR.

There are definitely hosts showing this problem which have not run #2909 themselves, that PR was specifically put into test hosts because they often deal with broken pools (my laptop unfortunately qualifies as such a mule since its got server-grade resources and quick physical access). However, the waterfall replication topology of our datacenter ZFS boxes may account for the error appearing on those machines - they have datasets which back up the test mules.

sempervictus avatar Feb 16 '15 07:02 sempervictus

Hi,

I just take a look on some servers we manage, and I get the same error on each off these,

zdb -mM rpool return with space map refcount mismatch: expected 363 != actual 0

these server all contains latest master with #2909

for now zpool status is happy, what is the correct way to handle this situation ?

Regards, Eric

edillmann avatar Feb 16 '15 09:02 edillmann

@edillmann : One of my pools is undamaged, so removing the patch from the stack appears "safe enough," provided you dont have additional errors coming up like the one which started all of this.

So far, vpool, the undamaged pool, has passed a scrub and a zdb -mc subsequent to the newly built ZFS and SPL modules being loaded without #2909 ( still reports a refcount mismatch but the checksums all align). Rpool is not so lucky, although it reports 0 errors on scrub now, it has the persistent error @0xffffffffffffffff:0x0 with the associated "unfreeable" empty space, and zdb -mc results in the same

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 115 of 116 ...
zdb: ../../cmd/zdb/zdb.c:2618: Assertion `0 == bptree_iterate(spa->spa_meta_objset, spa->spa_dsl_pool->dp_bptree_obj, B_FALSE, count_block_cb, &zcb, ((void *)0)) (0x0 == 0x34)' failed.
Aborted

If my hunch about the free space information going into the wrong metaslab's space map is correct, i'd suggest pulling it from the stack sooner than later, as all writes subsequent to the misplaced spacemap writes starting would exacerbate the issue and potentially lead to checksum errors and misplaced writes overwriting real blocks.

sempervictus avatar Feb 16 '15 10:02 sempervictus

@sempervictus running zdb -mc on one of these pools give me the following result :

......
leaked space: vdev 0, offset 0x17dff41600, size 62464
leaked space: vdev 0, offset 0x17dffffe00, size 512
leaked space: vdev 0, offset 0x17dff5fc00, size 512
leaked space: vdev 0, offset 0x17dfed5400, size 61440
leaked space: vdev 0, offset 0x17dfbf7e00, size 6144
leaked space: vdev 0, offset 0x17ded36e00, size 12800
leaked space: vdev 0, offset 0x17de155a00, size 90112
leaked space: vdev 0, offset 0x17dd5a1a00, size 6144
leaked space: vdev 0, offset 0x17db13a600, size 512
leaked space: vdev 0, offset 0x17d8d14400, size 48128
leaked space: vdev 0, offset 0x17cbd35600, size 84480
block traversal size 95683824128 != alloc 0 (leaked -95683824128)

bp count:         3201295
bp logical:    198605664768      avg:  62039
bp physical:   94573446144      avg:  29542     compression:   2.10
bp allocated:  95683824128      avg:  29889     compression:   2.08
bp deduped:             0    ref>1:      0   deduplication:   1.00
SPA allocated:          0     used:  0.00%

additional, non-pointer bps of type 0:      54539

That seem's obviously wrong and scaring, any advise is welcome

edillmann avatar Feb 16 '15 12:02 edillmann

As the pool runs, do those numbers get worse?

Far as what to do, i'm not the authority on this, but so far i've been tackling this with a slim zfs build and subsequent send/recv to clean up the data (have yet to see how effective that is).

I've got a production host which has a split workload hosting VMs and their underlying ZFS storage (as well as a bit of NFS) which is currently running zdb -mc on the new patch stack. No crash yet, we'll see how this goes.

In terms of actually fixing the on-disk errors we've created, #3111 may be a starting point, since scrubs seem to miss/ignore these possibly serious errors. Seems like a gap in the design or implementation IMO.

sempervictus avatar Feb 16 '15 12:02 sempervictus

This looks promising. zdb -mc on the production host which was given the new slimmed down patch stack, then scrubbed, then verified using zdb, returned the following:

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 1 of 2, metaslab 96 of 148 ....
1.26T completed ( 413MB/s) estimated time remaining: 0hr 00min 00sec         
    No leaks (block sum matches space maps exactly)

    bp count:        44876409
    bp logical:    865979950592      avg:  19296
    bp physical:   628182239232      avg:  13998     compression:   1.38
    bp allocated:  1382804668416      avg:  30813     compression:   0.63
    bp deduped:             0    ref>1:      0   deduplication:   1.00
    SPA allocated: 1382804226048     used: 54.02%

    additional, non-pointer bps of type 0:    1715474

space map refcount mismatch: expected 287 != actual 256

This thing's been running #2909 as part of its prior patch stack for a while, and isn't spitting out leaked space messages. Far from ideal, but there may be light at the end of the tunnel (often turns out to be an oncoming train when it comes to data loss).

sempervictus avatar Feb 16 '15 13:02 sempervictus

zdb -mc

says:

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

1.69T completed ( 597MB/s) estimated time remaining: 4294792253hr 4294967281min 4294967292sec        leaked space: vdev 0, offset 0x2000, size 4096
block traversal size 1865256910848 != alloc 0 (leaked -1865256910848)

    bp count:        17576044
    bp logical:    1957469844992      avg: 111371
    bp physical:   1853280807936      avg: 105443     compression:   1.06
    bp allocated:  1865256910848      avg: 106124     compression:   1.05
    bp deduped:             0    ref>1:      0   deduplication:   1.00
    SPA allocated:          0     used:  0.00%

    additional, non-pointer bps of type 0:      16461

hm, seems like we definitely need a fix for these kind of issues ! :+1:

@sempervictus #2909 could be the culprit but the reports from FreeBSD-based distros makes me believe that it might trigger it easier - thus it might not be the (sole ?) cause

kernelOfTruth avatar Feb 16 '15 15:02 kernelOfTruth

@behlendorf lots of vdevs use ditto blocks on my pools, could data integrity for those files also be compromised in combination with this issue ?

what features are heavy users of of the spacemap histogram(s) and could be prone to failure and/or data loss or corruption ?

kernelOfTruth avatar Feb 16 '15 16:02 kernelOfTruth

@kernelOfTruth I dont think that #2909 is the only cause for this - i'm pretty sure other things can cause this such as Linux trying to add ACLs after the dnode is written but before the emergency power outage. I can also imagine a situation in BSD where ZVOLs being allocated as GEOMs instead of simple DEVs may incur a non-atomic write condition for the practice, also causing inconsistent commits in an emergency outage.

If my assessment is correct, then #2909 simply allowed us to bring this to light (granted, in a rather scary manner), by allowing us to explicitly write to the wrong space map (i'm not even sure that 0 is a space map). I've seen reports of these problems long before ZoL even went psuedo-production, so its very much possible under different conditions, but not the easiest of things to reproduce unless you forcibly screw yourself to the wall like this.

Here's where i need a helping hand from someone who's head is in the light of day: is my theory about writing to occupied blocks with valid pointers not tripping a scrub viable? If so, we may have pools out there with overwritten data which is checksumming properly as the BP to it is valid (written correctly when the spacemap reported the wrong block and we wrote the block, creating a new pointer in the tree). If that IS viable, it may create uttterly unrecoverable conditions by overwriting snaps, and blocks on which the newly written CoW block depends (write 512Kb across 4 records of 32 4k blocks each, the 4th record could overwrite the first due to a space map error creating dangling blocks since the blocks comprising record 1 which point to record 2 are overwritten with record 4, and the EP is gone).

Please tell me someone like @dweeezil or @ryao have a back-pocket fsck util or patches to scrub allowing us to fix this mess. How do the vendor backed implementations handle this? I can think of a dozen vendors pretending to have invented the successor to sliced bread when actually running ZFS under the hood (usually neutered as hell and making users license features we take for granted, like pay-per-snap-licensing cost in a SAN you buy for 20K). Then again, if this isnt addressed in Illumos and BSD, probably not very likely that someone else has figured it out before those guys.

To everyone reading this and using #2909: my suggestion is to drop it from the stack now. Systems which lost the patch from the set last night in our environments are static in their misreporting - the numbers arent growing (aside from my machine where the cksums are through the roof now and smartctl --test long passed with flying colors after a 15 hour torture session, without divine intervention my pool is hosed methinks).

sempervictus avatar Feb 16 '15 21:02 sempervictus

an example how this manifests on newly created zvols/zpools:

zpool create -o ashift=12 SWAP2_1 /dev/mapper/swap2_1 -f

zpool create -o ashift=12 SWAP2_2 /dev/mapper/swap2_2 -f

zfs set atime=off SWAP2_1

zfs set atime=off SWAP2_2

zfs create -V 8G -b $(getconf PAGESIZE) \
              -o primarycache=metadata \
              -o com.sun:auto-snapshot=false SWAP2_1/swap

zfs create -V 8G -b $(getconf PAGESIZE) \
              -o primarycache=metadata \
              -o com.sun:auto-snapshot=false SWAP2_2/swap

zfs set sync=always SWAP2_1/swap

zfs set sync=always SWAP2_2/swap


mkswap -f /dev/zvol/SWAP2_1/swap

mkswap -f /dev/zvol/SWAP2_2/swap
zdb -mc SWAP2_1

Metaslabs:
    vdev          0
    metaslabs   117   offset                spacemap          free      
    ---------------   -------------------   ---------------   -------------
    metaslab      0   offset            0   spacemap      0   free     512M
    metaslab      1   offset     20000000   spacemap      0   free     512M
    metaslab      2   offset     40000000   spacemap      0   free     512M
    metaslab      3   offset     60000000   spacemap      0   free     512M
    metaslab      4   offset     80000000   spacemap      0   free     512M
    metaslab      5   offset     a0000000   spacemap      0   free     512M
    metaslab      6   offset     c0000000   spacemap      0   free     512M
    metaslab      7   offset     e0000000   spacemap      0   free     512M
    metaslab      8   offset    100000000   spacemap      0   free     512M
    metaslab      9   offset    120000000   spacemap      0   free     512M
    metaslab     10   offset    140000000   spacemap      0   free     512M
    metaslab     11   offset    160000000   spacemap      0   free     512M
    metaslab     12   offset    180000000   spacemap      0   free     512M
    metaslab     13   offset    1a0000000   spacemap      0   free     512M
    metaslab     14   offset    1c0000000   spacemap      0   free     512M
    metaslab     15   offset    1e0000000   spacemap      0   free     512M
    metaslab     16   offset    200000000   spacemap      0   free     512M
    metaslab     17   offset    220000000   spacemap      0   free     512M
    metaslab     18   offset    240000000   spacemap      0   free     512M
    metaslab     19   offset    260000000   spacemap      0   free     512M
    metaslab     20   offset    280000000   spacemap      0   free     512M
    metaslab     21   offset    2a0000000   spacemap      0   free     512M
    metaslab     22   offset    2c0000000   spacemap      0   free     512M
    metaslab     23   offset    2e0000000   spacemap      0   free     512M
    metaslab     24   offset    300000000   spacemap      0   free     512M
    metaslab     25   offset    320000000   spacemap      0   free     512M
    metaslab     26   offset    340000000   spacemap      0   free     512M
    metaslab     27   offset    360000000   spacemap      0   free     512M
    metaslab     28   offset    380000000   spacemap      0   free     512M
    metaslab     29   offset    3a0000000   spacemap      0   free     512M
    metaslab     30   offset    3c0000000   spacemap      0   free     512M
    metaslab     31   offset    3e0000000   spacemap      0   free     512M
    metaslab     32   offset    400000000   spacemap      0   free     512M
    metaslab     33   offset    420000000   spacemap      0   free     512M
    metaslab     34   offset    440000000   spacemap      0   free     512M
    metaslab     35   offset    460000000   spacemap      0   free     512M
    metaslab     36   offset    480000000   spacemap      0   free     512M
    metaslab     37   offset    4a0000000   spacemap      0   free     512M
    metaslab     38   offset    4c0000000   spacemap      0   free     512M
    metaslab     39   offset    4e0000000   spacemap      0   free     512M
    metaslab     40   offset    500000000   spacemap      0   free     512M
    metaslab     41   offset    520000000   spacemap      0   free     512M
    metaslab     42   offset    540000000   spacemap      0   free     512M
    metaslab     43   offset    560000000   spacemap      0   free     512M
    metaslab     44   offset    580000000   spacemap      0   free     512M
    metaslab     45   offset    5a0000000   spacemap      0   free     512M
    metaslab     46   offset    5c0000000   spacemap      0   free     512M
    metaslab     47   offset    5e0000000   spacemap      0   free     512M
    metaslab     48   offset    600000000   spacemap      0   free     512M
    metaslab     49   offset    620000000   spacemap      0   free     512M
    metaslab     50   offset    640000000   spacemap      0   free     512M
    metaslab     51   offset    660000000   spacemap      0   free     512M
    metaslab     52   offset    680000000   spacemap      0   free     512M
    metaslab     53   offset    6a0000000   spacemap      0   free     512M
    metaslab     54   offset    6c0000000   spacemap      0   free     512M
    metaslab     55   offset    6e0000000   spacemap      0   free     512M
    metaslab     56   offset    700000000   spacemap      0   free     512M
    metaslab     57   offset    720000000   spacemap      0   free     512M
    metaslab     58   offset    740000000   spacemap      0   free     512M
    metaslab     59   offset    760000000   spacemap      0   free     512M
    metaslab     60   offset    780000000   spacemap      0   free     512M
    metaslab     61   offset    7a0000000   spacemap      0   free     512M
    metaslab     62   offset    7c0000000   spacemap      0   free     512M
    metaslab     63   offset    7e0000000   spacemap      0   free     512M
    metaslab     64   offset    800000000   spacemap      0   free     512M
    metaslab     65   offset    820000000   spacemap      0   free     512M
    metaslab     66   offset    840000000   spacemap      0   free     512M
    metaslab     67   offset    860000000   spacemap      0   free     512M
    metaslab     68   offset    880000000   spacemap      0   free     512M
    metaslab     69   offset    8a0000000   spacemap      0   free     512M
    metaslab     70   offset    8c0000000   spacemap      0   free     512M
    metaslab     71   offset    8e0000000   spacemap      0   free     512M
    metaslab     72   offset    900000000   spacemap      0   free     512M
    metaslab     73   offset    920000000   spacemap      0   free     512M
    metaslab     74   offset    940000000   spacemap      0   free     512M
    metaslab     75   offset    960000000   spacemap      0   free     512M
    metaslab     76   offset    980000000   spacemap      0   free     512M
    metaslab     77   offset    9a0000000   spacemap      0   free     512M
    metaslab     78   offset    9c0000000   spacemap      0   free     512M
    metaslab     79   offset    9e0000000   spacemap      0   free     512M
    metaslab     80   offset    a00000000   spacemap      0   free     512M
    metaslab     81   offset    a20000000   spacemap      0   free     512M
    metaslab     82   offset    a40000000   spacemap      0   free     512M
    metaslab     83   offset    a60000000   spacemap      0   free     512M
    metaslab     84   offset    a80000000   spacemap      0   free     512M
    metaslab     85   offset    aa0000000   spacemap      0   free     512M
    metaslab     86   offset    ac0000000   spacemap      0   free     512M
    metaslab     87   offset    ae0000000   spacemap      0   free     512M
    metaslab     88   offset    b00000000   spacemap      0   free     512M
    metaslab     89   offset    b20000000   spacemap      0   free     512M
    metaslab     90   offset    b40000000   spacemap      0   free     512M
    metaslab     91   offset    b60000000   spacemap      0   free     512M
    metaslab     92   offset    b80000000   spacemap      0   free     512M
    metaslab     93   offset    ba0000000   spacemap      0   free     512M
    metaslab     94   offset    bc0000000   spacemap      0   free     512M
    metaslab     95   offset    be0000000   spacemap      0   free     512M
    metaslab     96   offset    c00000000   spacemap      0   free     512M
    metaslab     97   offset    c20000000   spacemap      0   free     512M
    metaslab     98   offset    c40000000   spacemap      0   free     512M
    metaslab     99   offset    c60000000   spacemap      0   free     512M
    metaslab    100   offset    c80000000   spacemap      0   free     512M
    metaslab    101   offset    ca0000000   spacemap      0   free     512M
    metaslab    102   offset    cc0000000   spacemap      0   free     512M
    metaslab    103   offset    ce0000000   spacemap      0   free     512M
    metaslab    104   offset    d00000000   spacemap      0   free     512M
    metaslab    105   offset    d20000000   spacemap      0   free     512M
    metaslab    106   offset    d40000000   spacemap      0   free     512M
    metaslab    107   offset    d60000000   spacemap      0   free     512M
    metaslab    108   offset    d80000000   spacemap      0   free     512M
    metaslab    109   offset    da0000000   spacemap      0   free     512M
    metaslab    110   offset    dc0000000   spacemap      0   free     512M
    metaslab    111   offset    de0000000   spacemap      0   free     512M
    metaslab    112   offset    e00000000   spacemap      0   free     512M
    metaslab    113   offset    e20000000   spacemap      0   free     512M
    metaslab    114   offset    e40000000   spacemap      0   free     512M
    metaslab    115   offset    e60000000   spacemap      0   free     512M
    metaslab    116   offset    e80000000   spacemap      0   free     512M


Traversing all blocks to verify metadata checksums and verify nothing leaked ...


leaked space: vdev 0, offset 0xb000, size 4096
leaked space: vdev 0, offset 0xf000, size 4096
leaked space: vdev 0, offset 0x15000, size 73728
leaked space: vdev 0, offset 0x13000, size 4096
leaked space: vdev 0, offset 0xd000, size 4096
leaked space: vdev 0, offset 0x5f000, size 536481792
leaked space: vdev 0, offset 0x4b000, size 8192
leaked space: vdev 0, offset 0x29000, size 114688
leaked space: vdev 0, offset 0x2c000b000, size 4096
leaked space: vdev 0, offset 0x2c000f000, size 4096
leaked space: vdev 0, offset 0x2c0015000, size 73728
leaked space: vdev 0, offset 0x2c0013000, size 4096
leaked space: vdev 0, offset 0x2c000d000, size 4096
leaked space: vdev 0, offset 0x2c005d000, size 536489984
leaked space: vdev 0, offset 0x2c004b000, size 8192
leaked space: vdev 0, offset 0x2c0029000, size 114688
leaked space: vdev 0, offset 0x580002000, size 4096
leaked space: vdev 0, offset 0x580000000, size 4096
leaked space: vdev 0, offset 0x580008000, size 40960
leaked space: vdev 0, offset 0x580006000, size 4096
leaked space: vdev 0, offset 0x580040000, size 536608768
leaked space: vdev 0, offset 0x580036000, size 8192
leaked space: vdev 0, offset 0x580014000, size 114688
block traversal size 430080 != alloc 0 (leaked -430080)

    bp count:              69
    bp logical:        712192      avg:  10321
    bp physical:        45568      avg:    660     compression:  15.63
    bp allocated:      430080      avg:   6233     compression:   1.66
    bp deduped:             0    ref>1:      0   deduplication:   1.00
    SPA allocated:          0     used:  0.00%

    additional, non-pointer bps of type 0:         26

kernelOfTruth avatar Feb 16 '15 21:02 kernelOfTruth

ugh :(

related: http://lists.freebsd.org/pipermail/freebsd-current/2011-October/028232.html

http://lists.open-zfs.org/pipermail/developer/2014-June/000691.html

not sure if related but referencing #2662 and #2702

kernelOfTruth avatar Feb 16 '15 22:02 kernelOfTruth

@kernelOfTruth I confirm to have the same behaviour with master, I also run zdb -e mc on an fresh created and exported pool with the same results :

    leaked space: vdev 0, offset 0xb000, size 4096
    leaked space: vdev 0, offset 0x15000, size 16384
    leaked space: vdev 0, offset 0x1b000, size 8192
    leaked space: vdev 0, offset 0x12000, size 8192
    leaked space: vdev 0, offset 0x2e000, size 8589746176
    leaked space: vdev 0, offset 0x25000, size 4096
    leaked space: vdev 0, offset 0x1e000, size 4096
    leaked space: vdev 1, offset 0xb000, size 4096
    leaked space: vdev 1, offset 0x15000, size 16384
    leaked space: vdev 1, offset 0x1b000, size 8192
    leaked space: vdev 1, offset 0x12000, size 8192
    leaked space: vdev 1, offset 0x2e000, size 8589746176
    leaked space: vdev 1, offset 0x25000, size 4096
    leaked space: vdev 1, offset 0x1e000, size 4096
    leaked space: vdev 1, offset 0x2c00005000, size 8192
    leaked space: vdev 1, offset 0x2c0000e000, size 8192
    leaked space: vdev 1, offset 0x2c00008000, size 16384
    leaked space: vdev 1, offset 0x2c00021000, size 8589799424
    leaked space: vdev 1, offset 0x2c00018000, size 4096
    leaked space: vdev 1, offset 0x2c00011000, size 4096
    block traversal size 380928 != alloc 0 (leaked -380928)

edillmann avatar Feb 17 '15 08:02 edillmann

@sempervictus @kernelOfTruth did you use #2129 in your patch stack ? I just remove it and it seem's to cleanup things.

edillmann avatar Feb 17 '15 08:02 edillmann

We try to put #2129 through its paces, and a number of the affected hosts use it, but the iSCSI nodes were dying after weeks of running with it, and we had to strip it out. While trying to fix this ungodly mess i did run master without it, same deal.

Here's the good news: i managed to salvage my data by creating a new rpool and only streaming the unaffected datasets to the new pool. A few changes to crypt and fstabs, copy of the /boot and /boot/efi data with updated UUIDs, and the new pool boots fine and passes all previously failing zdb checks.

It appears that in order to ensure that bad data does not leak through, zfs send/recv must be done on both sides with a build which does not screw up space maps itself, without the -e or -D options. I did this with the following stack:

  * origin/pr/2820
  * origin/pr/2012
  * origin/pr/1099
  * origin/pr/2129
  * master @ 7d2868d5fc88a4c30769b44f56a3a88a4277a9ab

Although this has resolved my own data corruption problem, i'm now faced with a bunch of live storage hosts at our datacenter potentially plotting to allocate a block which looks free in the space map but has data still in it. RaidZ2 should help there i guess, but still kind of creepy to think about some top branch in the tree getting cut off by a misplaced write.

sempervictus avatar Feb 17 '15 09:02 sempervictus

@sempervictus @kernelOfTruth @edillmann There have been a few space map related bugs which have crept up on all the ZFS platforms over the years. We take them seriously because as you know there are currently tools to detect the issue but none to fix it (aside from send/recv to a new pool). In fact, one of the key reasons we run ztest against every patch is to attempt to detect this issues. Arguably we should also be running the zdb leak check on the test pools used by xfstests and filebench too.

I think it's definitely possible that the unmerged #2129 or #2909 pull requests could have introduced an issue like this. It's not totally clear to .e from the comments above but were you able to verify that this issue doesn't exist on master? We absolutely can't tag anything if we believe this issue exists in the master branch.

One other thing to keep in mind. The zdb leak check only works reliably on exported or 100% idle pools. If the pool is active it can generate false positives.

behlendorf avatar Feb 17 '15 21:02 behlendorf

@behlendorf I did the test on the same idle pool zdb #2129 -mc rpool

Metaslabs: vdev 0 metaslabs 174 offset spacemap free
--------------- ------------------- --------------- ------------- metaslab 0 offset 0 spacemap 0 free 512M metaslab 1 offset 20000000 spacemap 0 free 512M metaslab 2 offset 40000000 spacemap 0 free 512M metaslab 3 offset 60000000 spacemap 0 free 512M metaslab 4 offset 80000000 spacemap 0 free 512M metaslab 5 offset a0000000 spacemap 0 free 512M ... metaslab 169 offset 1520000000 spacemap 0 free 512M metaslab 170 offset 1540000000 spacemap 0 free 512M metaslab 171 offset 1560000000 spacemap 0 free 512M metaslab 172 offset 1580000000 spacemap 0 free 512M metaslab 173 offset 15a0000000 spacemap 0 free 512M

**!!! All spacemaps usage is = 0 **

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

leaked space: vdev 0, offset 0x68000, size 73728 leaked space: vdev 0, offset 0x20000, size 8192 leaked space: vdev 0, offset 0x9a000, size 8192 leaked space: vdev 0, offset 0xba000, size 8192 ... leaked space: vdev 0, offset 0x15bf51c000, size 10739712 leaked space: vdev 0, offset 0x15befc8000, size 5357568 leaked space: vdev 0, offset 0x15bb490000, size 24428544 leaked space: vdev 0, offset 0x15bb466000, size 8192 leaked space: vdev 0, offset 0x15b7266000, size 24576 block traversal size 35960070144 != alloc 0 (leaked -35960070144)

bp count:          797123
bp logical:    55863028736      avg:  70080
bp physical:   31524022784      avg:  39547     compression:   1.77
bp allocated:  35960070144      avg:  45112     compression:   1.55
bp deduped:             0    ref>1:      0   deduplication:   1.00
SPA allocated:          0     used:  0.00%

additional, non-pointer bps of type 0:      55465

zdb@master mc rpool

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 173 of 174 ... 33.2G completed (1079MB/s) estimated time remaining: 0hr 00min 00sec
No leaks (block sum matches space maps exactly)

bp count:          797102
bp logical:    55862745088      avg:  70082
bp physical:   31523987456      avg:  39548     compression:   1.77
bp allocated:  35959865344      avg:  45113     compression:   1.55
bp deduped:             0    ref>1:      0   deduplication:   1.00
SPA allocated: 35959865344     used: 38.49%

additional, non-pointer bps of type 0:      55459

space map refcount mismatch: expected 175 != actual 174

My conclusion is that zdb@#2129 has a little bug about reading spacemaps, this problem doesn't occurs with master, or should i worry about refount mismatch ?

edillmann avatar Feb 18 '15 07:02 edillmann

@edillmann thanks for running this test. I does sound like somethings wrong with the abd branch you tested. But I'm relieved to see that master doesn't suffer from this. However, we should explain the refcount mismatch. Do you have a trivial test case for the refcount mismatch, if you could you open a new issue with it.

behlendorf avatar Feb 18 '15 19:02 behlendorf

I'm removing this as an 0.6.4 blocker. The changes which caused this problem were not part of the master branch and therefore shouldn't hold up the release.

behlendorf avatar Feb 27 '15 22:02 behlendorf

So it's confirmed that some aspect of ABD (#2129) or an older version of it are at fault?

I have a planned upgrade looming and want to know.

DeHackEd avatar Feb 27 '15 22:02 DeHackEd

@edillmann yes, I used #2129 since that improved the desktop improvement quite noticably

without it (now) - there are some increased minor latencies here and there

Did you try #2129 by simply applying the pull to the latest master via:

https://github.com/zfsonlinux/zfs/pull/2129.diff

whether that also lead to the spacemap issues ?

@behlendorf Seems like a good idea to add zdb to the test battery

As I'm also using Btrfs I noticed a feature that seems to be similar to ZFS' space maps: space_cache , there is the option to clear and subsequently rebuild it - seemingly without data loss, that's at least the impression I got from reading through the mailing list

it would be nice if that also were possible on ZFS

I still don't have clear answer to my question which is:

When "just" using zpool's with those broken space maps if there is the possibility that existing data gets overwritten and thus corrupted (which really shouldn't happen)

That's my main concern - since, after going through my data I seemingly don't have any pool that is/was NOT affected (all affected), thus I started destroying and creating pools from scratch which don't exhibit this issue

No apparaent data loss yet

@DeHackEd I'd also like to know that - since it makes quite a difference in usage, @sempervictus seems to be running newly created pools or at least pools with zvols on a patch stack which don't suffer from the here mentioned space map problems if I understood correctly

@sempervictus So that means running the mentioned patch stack on top of latest master

  * origin/pr/2820
  * origin/pr/2012
  * origin/pr/1099
  * origin/pr/2129
  * master @ 7d2868d5fc88a4c30769b44f56a3a88a4277a9ab

does NOT show the space map problems and thus would get a :+1: from you ?

Thanks

kernelOfTruth avatar Feb 27 '15 22:02 kernelOfTruth

Let me summarize where I think things are at.

  • The #2909 debug patch could cause an issue like this. It was never part of master.
  • The #2129 (ABD) patches might have caused something like this. The current version is believed to be safe and correct but older versions or alternate development branches may not be.
  • Over the years there have been a small number of other related bug which could cause this issue. However, at the moment, all known causes of this issue have been resolved in master.

Seems like a good idea to add zdb to the test battery

Absolutely, ztest already does this however it doesn't provide perfect coverage. Running the zdb leak checker after both xfstests and filebench would improve our test coverage. I'll see about adding it to the scripts.

Recreating the space maps.

Issue #3111 was opened as a feature request to explore doing this. There are a variety of ways to go about it.

When "just" using zpool's with those broken space maps if there is the possibility that existing data gets overwritten and thus corrupted (which really shouldn't happen)

Yes. If the space maps for a pool are wrong it's possible that existing data will be overwritten. If this happens ZFS will detect the damaged blocks during a scrub and may be able to repair them. However, there is currently no existing tool to repair the space maps themselves.

behlendorf avatar Feb 27 '15 23:02 behlendorf