zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Filesystem can not be mounted: Input/output error

Open Fmstrat opened this issue 2 years ago • 44 comments

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04
Kernel Version 5.4.0-113-generic
Architecture x86_64
OpenZFS Version 0.8.3-1ubuntu12.13

Describe the problem you're observing

After a reboot, ZFS volumes aren't mounting, some do, some don't.

# zpool status s -v
  pool: s
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 1 days 04:22:08 with 0 errors on Mon May  9 04:46:19 2022
config:

        NAME                                   STATE     READ WRITE CKSUM
        s                                      ONLINE       0     0     0
          raidz2-0                             ONLINE       0     0     0
            ata-ST16000NM001G-<redacted>  ONLINE       0     0     0
            ata-ST16000NM001G-<redacted>  ONLINE       0     0     0
            ata-ST16000NM001G-<redacted>  ONLINE       0     0     0
            ata-ST16000NM001G-<redacted>  ONLINE       0     0     0
            ata-ST16000NM001G-<redacted>  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        s/storage/n/random:<0x0>
        s/storage/n/backup:<0x0>
        s/storage/n/plex:<0x0>
        s/storage/n/[email protected]:<0x0>
        s/storage/n/security:<0x0>
        s/storage/n/receipts:<0x0>
        s/storage/n/creative:<0x0>
        s/storage/n/music:<0x0>
        s/storage/n/ebooks:<0x0>
        s/storage/n/pictures:<0x0>
        s/storage/n/software:<0x0>
# zfs mount s/storage/n/random
filesystem 's/storage/n/random' can not be mounted: Input/output error
cannot mount 's/storage/n/random': Invalid argument

The issue with the snapshot above was not visible at boot. It only came up after I tried to mount it. I was able to mount the original snapshot (non-incremental) with mount -t zfs s/storage/n/[email protected] /mnt/test but I don't have original snapshots for every volume.

Have I lost this data or is this recoverable? What works vs what doesn't seems eerily similar to https://github.com/openzfs/zfs/issues/8103

Describe how to reproduce the problem

This happened after the filesystem locked up the server. Commands accessing the zpool stopped responding, and required a forced reboot (kill -9 didn't work).

Fmstrat avatar May 29 '22 14:05 Fmstrat

Some notes that may help:

  • These volumes use native encryption
  • They were sent raw from another server in another state
  • Incremental snaps have been sent just fine up until now

Fmstrat avatar May 29 '22 14:05 Fmstrat

After attempting zfs mount s/storage/n/random

zfs events -v

May 29 2022 10:22:58.641738494 ereport.fs.zfs.authentication
        class = "ereport.fs.zfs.authentication"
        ena = 0x222d49cc46000001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0x6c9a0324e11e4bcb
        (end detector)
        pool = "s"
        pool_guid = 0x6c9a0324e11e4bcb
        pool_state = 0x0
        pool_context = 0x0
        pool_failmode = "wait"
        zio_objset = 0x203
        zio_object = 0x0
        zio_level = 0xffffffffffffffff
        zio_blkid = 0x0
        time = 0x629381c2 0x264026fe 
        eid = 0x5e

zdb -ddddddddddd s/storage/n/random 0

Dataset s/storage/n/random [ZPL], ID 515, cr_txg 55, 10.2G, 1341 objects, rootbp DVA[0]=<0:4561eb61c000:3000> DVA[1]=<0:3dac6cd0000:3000> [L0 DMU objset] fletcher4 uncompressed authenticated LE contiguous unique double size=1000L/1000P birth=6786579L/6786579P fill=1341 cksum=1a382de2b3:472f3f2d19c6:678aeb2dae11b8:6ab5b1b595ebfe13

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
         0    6   128K    16K  1.94M     512   784K   85.52  DMU dnode (K=inherit) (Z=inherit)
        dnode flags: USED_BYTES 
        dnode maxblkid: 48
                (object encrypted)
Indirect blocks:
               0 L5      0:453ad05fa000:3000 0:417465d5000:3000 20000L/1000P F=1341 B=6786579/6786579 cksum=38f75516ecac3b9:6245ebc924f526fe:bc21378ec23acb05:cffb65f15cd1ae3b
               0  L4     0:4561eb619000:3000 0:3dac6cb2000:3000 20000L/1000P F=1341 B=6786579/6786579 cksum=39708c0200a872c:6c315aed5d9f5b31:d7a33c44e21b4277:2d4c602a8ab3294d
               0   L3    0:4561eb616000:3000 0:3dac6caf000:3000 20000L/1000P F=1341 B=6786579/6786579 cksum=391d61952c436bc:655f4d4c3c1837a3:f3afa6884b69f1fb:15c210dbae408617
               0    L2   0:4561eb613000:3000 0:3dac6cac000:3000 20000L/1000P F=1341 B=6786579/6786579 cksum=39904c9b205bd0c:6ec8be242f9b78a1:7a2ade7a821ca3a6:7cc890856018194f
               0     L1  0:44d02ce56000:3000 0:200b15ef1000:3000 20000L/1000P F=1341 B=6786579/6786579 cksum=46dc39d5f67fce1:4f49187ee197b77d:c888e98fdcfb7005:bc148888941aacfd
               0      L0 0:453ad0465000:9000 0:417466d7000:9000 4000L/4000P F=31 B=6786579/6786579 cksum=c5205989ca625345:2b8f66c7c9a31982:94f9158f6ca88b4b:835486f8dd072e63
            4000      L0 0:453ad046e000:9000 0:41746629000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=b25bbc73112ad62f:3f1f95d6d13044ce:4664d20c782dcd6c:c7ea403149cae970
            8000      L0 0:453ad0480000:9000 0:417465b1000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=eb6d2fea3a9c3414:8b2d5f00b062aeb9:ee1f2cbbf5f631e5:64b1915585145a2d
            c000      L0 0:453ad0477000:9000 0:41746632000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=ed4d5987fbdeae1a:ef98a8b9e3c58787:5eb5e893035171e2:5b774baf955d95d4
           10000      L0 0:453ad0489000:9000 0:417465ba000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=ef13b18e094645cf:1a9bc43d6b4c07f4:bf115fcc52c29114:40a9a95d6422fe42
           14000      L0 0:453ad0492000:9000 0:417465c3000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f11581dff59e07cf:4cf041c94dcc88b2:54fcfe0aed4463be:672cc4ff6a502d55
           18000      L0 0:453ad049b000:9000 0:417465cc000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=ecc5e844bd6cf15a:f4745c049f735e8d:78634f80d8413d9f:28452c8294ea8cc0
           1c000      L0 0:453ad04a4000:9000 0:417466f2000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=eee0547168cee2b9:22a9e61add50c27d:bcf58f46f78bae:1b5a9f1568b98458
           20000      L0 0:453ad04ad000:9000 0:417466fb000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f3d6be807d990923:fa4994034f061ea8:1c5d743a12e3c631:ffbc922d2ab22754
           24000      L0 0:453ad04bf000:9000 0:4174670d000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f0eeb9613cee306f:90b9e46869f29412:f39e54fc29632dbe:7dee39e5702cc490
           28000      L0 0:453ad04b6000:9000 0:41746704000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=eda333d1bef45d59:911a3ac84202526d:62c26be8b8b40d96:f72b540c45323a43
           2c000      L0 0:453ad04c8000:9000 0:41746716000:9000 4000L/4000P F=22 B=6786579/6786579 cksum=e61651c0694c120d:f1557b6c20529d5f:e32d71b7441ecc1b:6a0e375968f44971
           30000      L0 0:453ad04d1000:9000 0:4174671f000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f035415cca952edf:2a39dde8746300ea:d0eb0e09fe71f90a:1f72161aac43dc0
           34000      L0 0:453ad04da000:9000 0:41746728000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=edcae67009cb85fc:575b64474351f3ae:fb5ca0bf63141c05:244530c5f03871a6
           38000      L0 0:453ad04e3000:9000 0:41746731000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=ef9eaacba96a0bb5:579be615bc0b6e90:167a1fa64e18ff9e:9010b8eadc46609d
           3c000      L0 0:453ad04ec000:9000 0:4174673a000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f0d8a76000e230e8:ffae8220859038ec:85be61e693fffac9:82d44f63449196eb
           40000      L0 0:453ad04f5000:9000 0:41746743000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f2b2bf0c688a432c:513bbddb3b0695c0:3c1f0b7fafa41713:888cc7ad2c2aeea3
           44000      L0 0:453ad04fe000:9000 0:4174674c000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f2b853f5f94bb729:87987b23a74f8580:da69a9db1ab8edd9:4478ffec3415cfa2
           48000      L0 0:453ad0507000:9000 0:41746755000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f1907c966c4429ce:fa3f0443e7030107:76964df9917526a4:6031289b12008b92
           4c000      L0 0:453ad0510000:9000 0:4174675e000:9000 4000L/4000P F=21 B=6786579/6786579 cksum=e30644864c02c163:19e3c9489e035358:1abf19336bb9ea95:dd1ff01480397f63
           50000      L0 0:453ad0519000:9000 0:41746767000:9000 4000L/4000P F=30 B=6786579/6786579 cksum=d37d02d7b1f2ccde:6b807a1437247393:e5fc1b2af8da8b46:f81edecf6b1115f1
           54000      L0 0:453ad0522000:9000 0:41746770000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=e660d0767699bdae:5317b242cdfe322c:2f145a6ad16fea0c:7bd9f82207a5fff
           58000      L0 0:453ad052b000:9000 0:41746779000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=efc135451c38c820:71803c7e6cb853e4:d2e4c6ee6f1d079a:360a50bfb9005048
           5c000      L0 0:453ad0534000:9000 0:41746782000:9000 4000L/4000P F=19 B=6786579/6786579 cksum=dcb1a3d2dca91d87:142a71787aee81c6:e345a9f1162e45b0:70107d7b79908b75
           60000      L0 0:453ad053d000:9000 0:4174678b000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=e6906c08d7771693:9ca5ff7bd3b875ce:fd98191441dfba4e:13e638a0bbab2925
           64000      L0 0:453ad0546000:9000 0:41746794000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=eb6075148c780709:ca7a3481f1f9f9e4:96b959b77e34ce43:eb1598544a48e1ca
           68000      L0 0:453ad054f000:9000 0:4174679d000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=eb3f99c88120f4c9:a006fcb233182494:d4d7a6fedbcf6617:d1cc0accde59987d
           6c000      L0 0:453ad0558000:9000 0:417467a6000:9000 4000L/4000P F=25 B=6786579/6786579 cksum=f3a2c933ba40bc57:fe1b77584d3d9835:9aa6bb045ca952a8:cce271ecf9a14853
           70000      L0 0:453ad056a000:9000 0:417467b8000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=ec7cfc09883224b8:d6461dd5904e300:9f30f1fa192025ce:8075bc9aa019a795
           74000      L0 0:453ad0561000:9000 0:417467af000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f4984833e9d88527:178babc01de4a95b:194cc723c2eb0634:35a531012d19bc69
           78000      L0 0:453ad0573000:9000 0:417467c1000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f141f55cfb8ae45f:ecf65326ebc1d533:1b0236101a26b9f5:337399292eb01a58
           7c000      L0 0:453ad057c000:9000 0:417467ca000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f13a7d15ecebaf88:ce3ec01d947ed450:bbcefece9edd970c:a05818074eaa429b
           80000      L0 0:453ad0585000:9000 0:417467d3000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=e5aae880e3f48927:8ff07f3beeb1ebeb:4f0ab661443e6bc2:60f52b9d2d1c125
           84000      L0 0:453ad058e000:9000 0:417467dc000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=e50710b8dba20a5c:9a440c179e79fa86:5a63b8ebd4abef7e:7408543f35cdf887
           88000      L0 0:453ad0597000:9000 0:417467e5000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=eefa59d52d0e0c93:f246ecd9d4e3a298:2789b3705fa22ac:4ec86aecfb4d3d0d
           8c000      L0 0:453ad05a0000:9000 0:417467ee000:9000 4000L/4000P F=22 B=6786579/6786579 cksum=f016d48464251c54:e6a8554fa5e64aea:53baf589b0fc8f34:9240eb11793fe3ef
           90000      L0 0:453ad05a9000:9000 0:417467f7000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=ebf247601ea5ebb4:16879dc10488a3cb:5759cec35445993:bd00c95c8f092fc8
           94000      L0 0:453ad05b2000:9000 0:41746800000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f3121d3447b0d127:5e82a1b074fdfaff:4fe8e74d2962d44b:11ccc4eef8f107ef
           98000      L0 0:453ad05bb000:9000 0:41746809000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=f2941b2f3ddc96d2:85570474f8165e98:55da246b7d675be2:ce8e5b912a31dd38
           9c000      L0 0:453ad05c4000:9000 0:41746812000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=efa361125f1f4b4f:31e8a799edc7c3db:938bb506c5e98a61:8e89811377f96ba1
           a0000      L0 0:453ad05cd000:9000 0:4174681b000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=ec3e8bf9d264b28e:49da394a56b8e225:dff696afd478acd6:8de24efb77fd1924
           a4000      L0 0:453ad05d6000:9000 0:41746824000:9000 4000L/4000P F=22 B=6786579/6786579 cksum=eb77349c8842a11a:e9fc8fc5474ed970:b1f6b0616d64ca3:238bd4e72479f460
           b0000      L0 0:453ad05df000:9000 0:4172508f000:9000 4000L/4000P F=32 B=6786579/6786579 cksum=edf2a642120f7bf3:7c0857947cb2c9a1:987d6fb173bf05bd:9687dac1706c45bb
           b4000      L0 0:453ad05e8000:9000 0:41725098000:9000 4000L/4000P F=24 B=6786579/6786579 cksum=ee653458c731afb4:5575bade2241e733:3e4b8fd32c099fac:82d39605cc32cc10
           c0000      L0 0:453ad05f1000:9000 0:417250a1000:9000 4000L/4000P F=5 B=6786579/6786579 cksum=616ed2da495882a3:9956d736f67d25f6:6f3d8d0e8931bc67:c22c5091d5f4e145

                segment [0000000000000200, 000000000002ec00) size  186K
                segment [0000000000030000, 000000000004ea00) size  122K
                segment [0000000000050000, 0000000000050e00) size 3.50K
                segment [0000000000051200, 000000000005e600) size   53K
                segment [0000000000060000, 000000000006f200) size 60.5K
                segment [0000000000070000, 000000000008ec00) size  123K
                segment [0000000000090000, 00000000000a6c00) size   91K
                segment [00000000000b0000, 00000000000b7000) size   28K
                segment [00000000000c0000, 00000000000c0a00) size 2.50K

Fmstrat avatar May 29 '22 14:05 Fmstrat

@tcaputi hope you don't mind the tag, apologies if you do. I just know you've been through this one before.

Fmstrat avatar May 29 '22 14:05 Fmstrat

Well, you're running a 2+ year old version with known bugs (not that the latest version of native encryption doesn't also have a host of known bugs nobody's fixed), so perhaps you should consider not doing that.

That said, my wild guess would be that it's the unfixed bug where incremental receive will sometimes happily do zfs change-key incorrectly and update the metadata to think the wrong wrapping key is the one in use, which you only notice on reboot...when it tries to unlock the dataset, and because it has the incorrect idea of which key to use, it obviously does not succeed.

e: #12000, #12614

rincebrain avatar May 29 '22 16:05 rincebrain

@rincebrain Thank you! So it looks like in Ubuntu 20.04 I can update from 0.8.3-1ubuntu12.13 to 0.8.3-1ubuntu12.14. If this is indeed it, then I think this may have occurred due to some initial issues during transfer. Here's what happened:

  • I made s/storage on the backup server as the encryption root. I then added a bunch of other pools of s/storage/* underneath it
  • I used send to put s/storage and all sub-pools onto external drives
  • I drove those drives to the main server
  • A couple of the sub-pools had failed files, so all of s/storage wouldn't send from the external to the main server. I had to send each sub-pool independently, likely assigning the encryption key to each vs inheriting from the root
  • Incremental snaps were then sent from the main server to the backup server
  • This is likely the first time I've rebooted the backup server

A few questions:

  • Is this update sufficient if I do it on both the main server and the backup server?
  • Would upgrading allow me to mount, or are there some steps I would need to take first?
  • I.E. could I assign an encryption key to each of the sub-pools manually, then mount (these are mounted read-only always on the backup server)?

Just trying to figure out the best way to get up and running again under Ubuntu 20.04. Thanks!

Fmstrat avatar May 29 '22 18:05 Fmstrat

The current series of releases is 2.1.x, you are on 0.8.x. Upgrading from 0.8.3-mumble13 to 0.8.3-mumble14 was not my suggestion, though you could always try reporting your issue to Ubuntu's bugtracker in that case, as I would be pretty astonished if 0.8.x saw any fixes or releases ever again here.

Upgrading things won't fix this problem at this point. If I'm right about what happened, it wouldn't have even prevented them before, since that bug is unfixed.

One could write code to rekey the individual datasets forcibly ignoring that you don't have the old key loaded, there's even the framework for it to exist already, it's just not written anywhere (that I'm aware of); I have an incomplete version that doesn't work on every key type at the moment, but haven't gone back to figure out what's missing in a bit.

rincebrain avatar May 29 '22 18:05 rincebrain

@rincebrain Well that doesn't sound promising then. What's your recommended upgrade methodology to 2.1.x for Ubuntu 20.04 that wouldn't involve me losing access to data on the main server? They're supposed to still be supporting 20.04, but it sounds like zfs is woefully out of date there.

I assume by "write code" you don't mean use the zfs command to assign a new key? I'm not sure I'd have the expertise to pull that off without a guide or walk-through of some kind.

Is there any way I can get this to work without starting from scratch? Such as leveraging the first snap that is mountable in combination with an upgrade (from.. somewhere)? This is many TB of data separated by many states, so I can't easily do full sends across the internet.

Fmstrat avatar May 29 '22 18:05 Fmstrat

From this reddit thread it looks like I could upgrade without issue, and from this one there appears to be a maintained PPA.

So if I went that route, upgraded on both systems, would it be possible to roll back to a working snapshot (if I have one) on the backup server, then send from the main server to get things in sync? Or do I need to send from scratch all over again? And since that's Reddit, and you're you, does the above upgrade path even make sense to a ZFS dev? Thanks again!

Fmstrat avatar May 29 '22 19:05 Fmstrat

I believe there's a popular PPA; you could also build packages from source; that's just general advice though, if I'm correct about the problem, it's still unfixed now too, so it wouldn't have helped.

I don't, no. If you can mount some old snapshots, though, but not newer ones, on the same dataset, I don't think it can be the bug I'm describing, because I don't think there's any sort of old key history kept, so they should all be equally unusable.

One thing you could try, potentially, assuming it's one of the encryption bugs that has been fixed instead, would be to upgrade to 2.1.4 using the PPA or building your own packages or w/e, import the pool, see if you can mount things, and if not, you could try doing a raw send of the encrypted dataset and receiving it anew on the same pool somewhere else, e.g. zfs send -w z/storage@foo | zfs recv z/storage_new, and seeing if you can unlock and mount it afterward,, since there are one or two bugs that would work for.

rincebrain avatar May 29 '22 19:05 rincebrain

Ahh yes. I can mount the original snapshots, just like in the first thread I linked. Every incremental snapshot that was sent after won't mount, but the first will. And any pool that only has that first snapshot also mounts as expected. A big concern I have right now is if the main server reboots and I lose access to my data altogether.

Fmstrat avatar May 29 '22 19:05 Fmstrat

Btw, this is what I mean by they are different from a key perspective:

Main server:

# zfs get keylocation s/storage/n/random
NAME                PROPERTY     VALUE               SOURCE
s/storage/n/random  keylocation  file:///etc/zfskey  local

Backup server:

# zfs get encryption s/storage/n/random
NAME                PROPERTY    VALUE        SOURCE
s/storage/n/random  keylocation  none         default

The backup server was inheriting from s/storage originally. But maybe it's not anymore due to the odd sends from the other machine?

I.E.:

  • Backup server has s/storage/n/random original snap with inherited encryption key from s/storage
  • That snap is sent to the main server, but a key is assigned directly vs inherited
  • Main server makes changes to the filesystem
  • Main server sends an incremental s/storage/n/random to the backup server
  • At this point, the original snap can mount, the new one can't

Would the backup server get confused without keylocation set at this point?

Fmstrat avatar May 29 '22 19:05 Fmstrat

Yes, that is certainly quite confused, but I think I'm surprised that it can still mount the old one, as like I said, changing the wrapping key or no, the actual key that encrypts the data has remained the same, so you should still need the same unlock for both cases, AFAIK.

rincebrain avatar May 29 '22 19:05 rincebrain

Ok, thanks. If you were in my boat, what steps would you take?

Fmstrat avatar May 29 '22 19:05 Fmstrat

I'd probably try the send/recv thing I mentioned above, and if that didn't work, go hackily patch the code to override that property value in that specific case to be what you know to be correct and/or finish the WIP code for forcibly resetting the property.

I'm taking a break for a bit after I spent a bunch of time today hacking on something complicated and then someone pointed out I was being foolish and there was a much simpler solution, but if you remind me later I'll see if I can't make that branch that's incomplete for fixing this complete and then you can try it.

rincebrain avatar May 29 '22 19:05 rincebrain

If you can mount some old snapshots, though, but not newer ones, on the same dataset, I don't think it can be the bug I'm describing, because I don't think there's any sort of old key history kept, so they should all be equally unusable.

Yes, I agree. This looks more like the user accounting MAC issue which was fixed in #12981 a couple of months ago. So I'd say a recent ZFS version (2.1.4 or git@master) should be able to mount the datasets, at least it's worth a try.

OTOH if it's really an issue with overwriting the wrapping key, it would be good to see the exact sequence of commands which got you into this situation. Unfortunately, from the description you gave I can't deduce the exact commands you issued. I've a couple of reproducers for key issues which I can compare your commands against.

With "mount the snapshot" you mean mount -t zfs pool/dataset@snap /mnt, right?

AttilaFueloep avatar May 30 '22 16:05 AttilaFueloep

I think #12981 only means you could send/recv and then unlock the resulting recv, not mount the existing dataset, though I think writing a zhack command to trigger the same behavior without a recv might be reasonably doable...

rincebrain avatar May 30 '22 16:05 rincebrain

I will give some of this a shot, and also try to recreate the issue before (and after) I update.

If I need to fully re-send, should I be using raw, or re-encrypting at destination?

Fmstrat avatar May 30 '22 17:05 Fmstrat

Re-encrypting at destination would avoid problems with change-key propagating surprisingly, but there are other bugs which only require receiving raw no matter what (hi #11679), so pick your poison, IMO.

rincebrain avatar May 30 '22 17:05 rincebrain

@rincebrain Honestly I can't tell, didn't look very thoroughly, sorry. This issue just reminded me of an issue a had some time ago where I couldn't mount a raw send/recvd encrypted dataset (without key manipulation) while I could mount an older snapshot. It was due to the MAC of the accounting object failing and it did show the Permanent errors dataset: <0x0> symptom.

@Fmstrat Depends on your use case. If you want to do raw incremental sends, re-encrypting isn't an option. If you do raw sends, I'd strongly encourage you to not manipulate the keys on the receiving side since there are some edge cases where you could end up with inaccessible received datasets. If the send communication channel is trusted OTOH, you can re-encrypt plain incremental sends.

AttilaFueloep avatar May 30 '22 17:05 AttilaFueloep

And of course what @rincebrain just wrote.

AttilaFueloep avatar May 30 '22 17:05 AttilaFueloep

@AttilaFueloep i wonder if the root cause of that is actually related to the #11679 panic, which it turns out is caused by the accounting code's dirtying every object and syncing out trick causing inconsistent state in memory and a NULL deref if someone tries to read the dnode at the wrong time

rincebrain avatar May 30 '22 18:05 rincebrain

@AttilaFueloep what exactly do you mean by "manipulate the keys"? I'm trying to understand if this would help my current situation and if so, how.

Fmstrat avatar May 30 '22 20:05 Fmstrat

@AttilaFueloep @rincebrain

Ok, before I try anything else, I have this fully reproducible in 0.8.3-1ubuntu12.13:

# Make server A, create snapshot 1, then make that pool read-only
truncate -s 1G /tmp/tmpa.img
zpool create -o ashift=12 -O atime=off -O mountpoint=none tmpa /tmp/tmpa.img
zfs create -o encryption=on -o keyformat=hex -o keylocation=file:///etc/zfskey -o compression=lz4 -o mountpoint=/mnt/tmpa/storage tmpa/storage
zfs create -o mountpoint=/mnt/tmpa/storage/test tmpa/storage/test
echo 1 > /mnt/tmpa/storage/test/testfile.txt
zfs snap -r tmpa/storage@1
zfs set readonly=on tmpa/storage/test

# Make server B, send from A to B
truncate -s 1G /tmp/tmpb.img
zpool create -o ashift=12 -O atime=off -O mountpoint=none tmpb /tmp/tmpb.img
zfs create -o encryption=on -o keyformat=hex -o keylocation=file:///etc/zfskey -o compression=lz4 -o mountpoint=/mnt/tmpb/storage tmpb/storage
zfs send -w tmpa/storage/test@1 |pv -Wbraft |zfs recv tmpb/storage/test

# Set server B writable, change file, send to A
zfs set readonly=off tmpb/storage/test
zfs set mountpoint=/mnt/tmpb/storage/test tmpb/storage/test
zfs set keylocation=file:///etc/zfskey tmpb/storage/test
zfs load-key tmpb/storage/test
zfs mount tmpb/storage/test
echo 2 > /mnt/tmpb/storage/test/testfile.txt
zfs snap tmpb/storage/test@2
zfs send -w -I tmpb/storage/test@1 tmpb/storage/test@2 |pv -Wbraft |zfs recv tmpa/storage/test

# Unmount and try to remount A. Failure
zfs unmount tmpa/storage/test
zfs mount tmpa/storage/test

# Cleanup
zpool destroy tmpa
zpool destroy tmpb
rm /tmp/tmpa.img
rm /tmp/tmpb.img

Given this, would you recommend me try anything else before I upgrade?

Fmstrat avatar May 30 '22 20:05 Fmstrat

I did the above on a spare laptop with Ubuntu 20.04 on it, then upgraded to this version from the PPA:

zfs-2.1.4-0york0~20.04
zfs-kmod-2.0.2-1ubuntu5.4

However I'm still unable to mount, so no luck there. I then tried re-sending the incremental in hopes I could salvage:

# zfs send -w -I tmpb/storage/test@1 tmpb/storage/test@2 |pv -Wbraft |zfs recv tmpa/storage/test
11.0KiB 0:00:00 [ 380KiB/s] [ 380KiB/s]

But still no luck on mounting afterwards. For the heck of it, I reran the above test from scratch in 2.1.4 and lo and behold, the issue exists there, too.

Fmstrat avatar May 30 '22 21:05 Fmstrat

So far the only way I can get this to work is by re-sending the entire pools from scratch and setting them up with their own encryption roots:

# Make server A, create snapshot 1, then make that pool read-only
truncate -s 1G /tmp/tmp2a.img
zpool create -o ashift=12 -O atime=off -O mountpoint=none tmp2a /tmp/tmp2a.img
zfs create -o encryption=on -o keyformat=hex -o keylocation=file:///etc/zfskey -o compression=lz4 -o mountpoint=/mnt/tmp2a/storage tmp2a/storage
zfs create -o mountpoint=/mnt/tmp2a/storage/test tmp2a/storage/test
echo 1 > /mnt/tmp2a/storage/test/testfile.txt
zfs snap -r tmp2a/storage@1
zfs set readonly=on tmp2a/storage/test

# Make server B, send from A to B
truncate -s 1G /tmp/tmp2b.img
zpool create -o ashift=12 -O atime=off -O mountpoint=none tmp2b /tmp/tmp2b.img
zfs create -o encryption=on -o keyformat=hex -o keylocation=file:///etc/zfskey -o compression=lz4 -o mountpoint=/mnt/tmp2b/storage tmp2b/storage
zfs send -w tmp2a/storage/test@1 |pv -Wbraft |zfs recv tmp2b/storage/test

# Set server B writable, change file, send to A
zfs set readonly=off tmp2b/storage/test
zfs set mountpoint=/mnt/tmp2b/storage/test tmp2b/storage/test
zfs set keylocation=file:///etc/zfskey tmp2b/storage/test
zfs load-key tmp2b/storage/test
zfs mount tmp2b/storage/test
echo 2 > /mnt/tmp2b/storage/test/testfile.txt
zfs snap tmp2b/storage/test@2
zfs send -w -I tmp2b/storage/test@1 tmp2b/storage/test@2 |pv -Wbraft |zfs recv tmp2a/storage/test

# Unmount and try to remount A. Failure
zfs unmount tmp2a/storage/test
zfs mount tmp2a/storage/test

# Destroy server A test pool and resend from scratch
zfs destroy -r tmp2a/storage/test
zfs send -w tmp2b/storage/test@2 |pv -Wbraft |zfs recv tmp2a/storage/test
zfs set readonly=on tmp2a/storage/test
zfs set mountpoint=/mnt/tmp2a/storage/test tmp2a/storage/test
zfs set keylocation=file:///etc/zfskey tmp2a/storage/test
zfs load-key tmp2a/storage/test
zfs mount tmp2a/storage/test

# Make a change on server B and send
echo 3 > /mnt/tmp2b/storage/test/testfile.txt
zfs snap tmp2b/storage/test@3
zfs send -w -I tmp2b/storage/test@2 tmp2b/storage/test@3 |pv -Wbraft |zfs recv tmp2a/storage/test

# Unmount and try to remount A. Success
zfs unmount tmp2a/storage/test
zfs mount tmp2a/storage/test

# Cleanup
zpool destroy tmp2a
zpool destroy tmp2b
rm /tmp/tmp2a.img
rm /tmp/tmp2b.img

Fmstrat avatar May 30 '22 22:05 Fmstrat

One more test. The error also occurs if I send the entire pool, instead of just "test":

# Make server A, create snapshot 1, then make that pool read-only
truncate -s 1G /tmp/tmp2a.img
zpool create -o ashift=12 -O atime=off -O mountpoint=none tmp2a /tmp/tmp2a.img
zfs create -o encryption=on -o keyformat=hex -o keylocation=file:///etc/zfskey -o compression=lz4 -o mountpoint=/mnt/tmp2a/storage tmp2a/storage
zfs create -o mountpoint=/mnt/tmp2a/storage/test tmp2a/storage/test
echo 1 > /mnt/tmp2a/storage/test/testfile.txt
zfs snap -r tmp2a/storage@1
zfs set readonly=on tmp2a/storage/test

# Make server B, send from A to B
truncate -s 1G /tmp/tmp2b.img
zpool create -o ashift=12 -O atime=off -O mountpoint=none tmp2b /tmp/tmp2b.img
# -=> First difference, send the entire pool recursively
#zfs create -o encryption=on -o keyformat=hex -o keylocation=file:///etc/zfskey -o compression=lz4 -o mountpoint=/mnt/tmp2b/storage tmp2b/storage
#zfs send -w tmp2a/storage/test@1 |pv -Wbraft |zfs recv tmp2b/storage/test
zfs send -wR tmp2a/storage@1 |pv -Wbraft |zfs recv tmp2b/storage

# Set server B writable, change file, send to A
zfs set readonly=off tmp2b/storage/test
zfs set mountpoint=/mnt/tmp2b/storage/test tmp2b/storage/test
# -=> Next difference, key location and load keys for storage
#zfs set keylocation=file:///etc/zfskey tmp2b/storage/test
#zfs load-key tmp2b/storage/test
zfs set keylocation=file:///etc/zfskey tmp2b/storage
zfs load-key tmp2b/storage
zfs mount tmp2b/storage/test
echo 2 > /mnt/tmp2b/storage/test/testfile.txt
zfs snap tmp2b/storage/test@2
zfs send -w -I tmp2b/storage/test@1 tmp2b/storage/test@2 |pv -Wbraft |zfs recv tmp2a/storage/test

# Unmount and try to remount A. Failure
zfs unmount tmp2a/storage/test
zfs mount tmp2a/storage/test

# Cleanup
zpool destroy tmp2a
zpool destroy tmp2b
rm /tmp/tmp2a.img
rm /tmp/tmp2b.img

Fmstrat avatar May 30 '22 22:05 Fmstrat

Sorry for the flood, just trying to eliminate dead-ends. Using recursive snap doesn't seem to be the issue. It seems to be related to when the encryption root exists in tmpa/storage and is inherited in tmpa/storage/test but tmpb/storage/test is an encryption root.

# Make server A, create snapshot 1, then make that pool read-only
truncate -s 1G /tmp/tmp2a.img
zpool create -o ashift=12 -O atime=off -O mountpoint=none tmp2a /tmp/tmp2a.img
zfs create -o encryption=on -o keyformat=hex -o keylocation=file:///etc/zfskey -o compression=lz4 -o mountpoint=/mnt/tmp2a/storage tmp2a/storage
zfs create -o mountpoint=/mnt/tmp2a/storage/test tmp2a/storage/test
echo 1 > /mnt/tmp2a/storage/test/testfile.txt
# -=> Just snap test
#zfs snap -r tmp2a/storage@1
zfs snap tmp2a/storage/test@1
zfs set readonly=on tmp2a/storage/test

# Make server B, send from A to B
truncate -s 1G /tmp/tmp2b.img
zpool create -o ashift=12 -O atime=off -O mountpoint=none tmp2b /tmp/tmp2b.img
zfs create -o encryption=on -o keyformat=hex -o keylocation=file:///etc/zfskey -o compression=lz4 -o mountpoint=/mnt/tmp2b/storage tmp2b/storage
zfs send -w tmp2a/storage/test@1 |pv -Wbraft |zfs recv tmp2b/storage/test

# Set server B writable, change file, send to A
zfs set readonly=off tmp2b/storage/test
zfs set mountpoint=/mnt/tmp2b/storage/test tmp2b/storage/test
zfs set keylocation=file:///etc/zfskey tmp2b/storage/test
zfs load-key tmp2b/storage/test
zfs mount tmp2b/storage/test
echo 2 > /mnt/tmp2b/storage/test/testfile.txt
zfs snap tmp2b/storage/test@2
zfs send -w -I tmp2b/storage/test@1 tmp2b/storage/test@2 |pv -Wbraft |zfs recv tmp2a/storage/test

# Unmount and try to remount A. Failure
zfs unmount tmp2a/storage/test
zfs mount tmp2a/storage/test

# Cleanup
zpool destroy tmp2a
zpool destroy tmp2b
rm /tmp/tmp2a.img
rm /tmp/tmp2b.img

Fmstrat avatar May 30 '22 22:05 Fmstrat

@rincebrain Your last comment there was an interesting read. Look like the integration of encryption with the quota accounting still has some rough edges. Maybe I can say more after replaying the given reproducers.

AttilaFueloep avatar May 30 '22 22:05 AttilaFueloep

@Fmstrat

what exactly do you mean by "manipulate the keys"

Well, running zfs change-key [-i] on the received datasets or changing encryption roots. In short anything which will change the wrapping key on the destination. Sorry, can't give much detail right now, it's been a while I analyzed this and my memories are faint. Will have to lookup my notes on this to say more.

AttilaFueloep avatar May 30 '22 22:05 AttilaFueloep

Nice to see some reproducers. It's quite late here already, I'll try to digest your input and reproduce it on a somewhat current master tomorrow.

AttilaFueloep avatar May 30 '22 22:05 AttilaFueloep