zfs icon indicating copy to clipboard operation
zfs copied to clipboard

ZFS doesn't work with CONFIG_INIT_STACK_ALL_PATTERN

Open michael-brade opened this issue 1 year ago • 1 comments
trafficstars

System information

Type Version/Name
Distribution Name Debian
Distribution Version unstable
Kernel Version 6.1.86, 6.7.12
Architecture amd64
OpenZFS Version 2.2.3

Describe the problem you're observing

I am using ZFS on Root with zfs native encryption. If I compile the kernel with CONFIG_INIT_STACK_ALL_PATTERN, the zfs modules get loaded and I can enter the passphrase, but then everything just hangs without any further messages. Everything works as expected when using CONFIG_INIT_STACK_ALL_ZERO. So I would assume there is an uninitialized variable somewhere in the zfs/spl code.

michael-brade avatar Apr 26 '24 08:04 michael-brade

I compiled 6.1.83 with CONFIG_INIT_STACK_ALL_PATTERN=y, with OpenZFS master 21bc066ec. Creating a new dataset with encryption=aes-256-gcm, it completed a write cycle, scrub, export and reimport without issue.

So, more info required. Can you describe the steps you take in detail, and post the kernel config, pool topology and pool and dataset properties, and I can try to reproduce.

When it hangs, do you have a working root shell? If so, it'd be good to inspect the system and find out exactly where the hang is.

robn avatar Apr 29 '24 10:04 robn

thanks for the answer! and sorry for the delay, it'll take a few more days until I can provide you with the info - but I will :)

michael-brade avatar May 11 '24 16:05 michael-brade

So here we go. Attached is my kernel config: config-6.7.12-nullpattern

I have two computers where I can reproduce this - just take the current config and enable CONFIG_INIT_STACK_ALL_PATTERN. The result is: when booting, the ZFS modules get loaded, the rpool is found and the password prompt is shown. But it doesn't matter if I enter the correct password or not - after pressing enter, the cursor jumps into the next line and that's it. Everything I will type from then on has no consequence and is just printed on screen. There is no working root shell.

michael-brade avatar May 15 '24 08:05 michael-brade

Seems the secret detail was that it was compiled with Clang. I did a clean build of both 6.7.12 based on your config, and then OpenZFS against it. With GCC 12, all is well. With Clang 18, explosion:

[INFO  tini (1)] Spawned child process '/bin/sh' with pid '445'
+ sanity
+ sysctl kernel.hung_task_timeout_secs=10
kernel.hung_task_timeout_secs = 10
+ zpool create -O encryption=aes-256-gcm -O keylocation=prompt -O keyformat=passphrase tank raidz1 quizm0 quizm1 quizm2 quizm3
Enter new passphrase:
Re-enter new passphrase:
[    7.072524] general protection fault, probably for non-canonical address 0xaaaaaaaaaaaaaaaa: 0000 [#1] PREEMPT SMP NOPTI
[    7.072751] CPU: 2 PID: 449 Comm: zpool Tainted: G           O       6.7.12 #3
[    7.072830] RIP: 0010:memset_orig+0x98/0xb0
[    7.072898] Code: 00 00 ff c9 48 89 07 48 8d 7f 08 75 f5 83 e2 07 74 0a ff ca 88 07 48 8d 7f 01 75 f6 4c 89 d0 c3 cc cc cc cc 48 83 fa 07 76 e3 <48> 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff
[    7.073059] RSP: 0018:ffffad81c04e3488 EFLAGS: 00010286
[    7.073202] RAX: 0000000000000000 RBX: ffffad81c04e34a0 RCX: 0000000000000000
[    7.073273] RDX: aaaaaaaaaaaaaaaa RSI: 0000000000000000 RDI: aaaaaaaaaaaaaaaa
[    7.073349] RBP: ffff969e37cc7e00 R08: ffff969d07acbcc0 R09: 0000000000000002
[    7.073425] R10: aaaaaaaaaaaaaaaa R11: ffffffffc04184c0 R12: 0000000000000000
[    7.073506] R13: 0000000000000070 R14: ffffad81c04e36d0 R15: ffffad81c04e35b8
[    7.073582] FS:  00007f0f6a3b5040(0000) GS:ffff969e37d00000(0000) knlGS:0000000000000000
[    7.073658] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.073722] CR2: 0000563b985ccff8 CR3: 000000010284a000 CR4: 00000000003506b0
[    7.073800] Call Trace:
[    7.073831]  <TASK>
[    7.073860]  ? __die_body+0x64/0xb0
[    7.073908]  ? die_addr+0xa0/0xc0
[    7.073950]  ? exc_general_protection+0x121/0x1d0
[    7.074003]  ? __kmalloc_node+0x90/0x180
[    7.074045]  ? asm_exc_general_protection+0x22/0x30
[    7.074105]  ? __pfx_aes_encrypt_contiguous_blocks+0x10/0x10 [zfs]
[    7.074430]  ? memset_orig+0x98/0xb0
[    7.074472]  gcm_clear_ctx+0x82/0xd0 [zfs]
[    7.074699]  aes_encrypt_atomic+0x1c8/0x2a0 [zfs]
[    7.074889]  crypto_encrypt+0x115/0x1a0 [zfs]
[    7.075128]  zio_do_crypt_uio+0x285/0x320 [zfs]
[    7.075365]  zio_crypt_key_wrap+0x1f6/0x230 [zfs]
[    7.075558]  dsl_crypto_key_sync+0xbc/0x160 [zfs]
[    7.075752]  dsl_crypto_key_create_sync+0x9e/0x1d0 [zfs]
[    7.076098]  dsl_dataset_create_crypt_sync+0x23c/0x390 [zfs]
[    7.076320]  dsl_dataset_create_sync_dd+0x478/0x570 [zfs]
[    7.076518]  dsl_pool_create+0x2ff/0x4c0 [zfs]
[    7.076720]  spa_create+0x740/0xc30 [zfs]
[    7.076905]  zfs_ioc_pool_create+0x15f/0x330 [zfs]
[    7.077059]  zfsdev_ioctl_common+0x58c/0x800 [zfs]
[    7.077212]  ? __kmalloc_node+0xfb/0x180
[    7.077259]  ? kvmalloc_node+0x41/0xc0
[    7.077305]  zfsdev_ioctl+0x5a/0xc0 [zfs]
[    7.077450]  __x64_sys_ioctl+0xc7d/0xdd0
[    7.077496]  ? kvm_clock_get_cycles+0x14/0x30
[    7.077556]  ? ktime_get+0x38/0xa0
[    7.077603]  ? clockevents_program_event+0x88/0x100
[    7.077670]  ? hrtimer_interrupt+0x122/0x3a0
[    7.077731]  do_syscall_64+0x50/0xf0
[    7.077779]  ? fpregs_assert_state_consistent+0x23/0x50
[    7.077837]  entry_SYSCALL_64_after_hwframe+0x73/0x7b
[    7.077903] RIP: 0033:0x7f0f6ab6bc5b
[    7.077949] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[    7.078126] RSP: 002b:00007fffa0b4b260 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[    7.078208] RAX: ffffffffffffffda RBX: 000055a455e063e0 RCX: 00007f0f6ab6bc5b
[    7.078295] RDX: 00007fffa0b4b750 RSI: 0000000000005a00 RDI: 0000000000000003
[    7.078384] RBP: 00007fffa0b4ed30 R08: 0000000000000000 R09: 0000000000000000
[    7.078469] R10: 00007f0f6adf3660 R11: 0000000000000246 R12: 000055a455dfb2c0
[    7.078557] R13: 000055a455e05520 R14: 00007fffa0b4b750 R15: 00007fffa0b4b350
[    7.078644]  </TASK>
[    7.078679] Modules linked in: zfs(O) spl(O)
[    7.078792] ---[ end trace 0000000000000000 ]---
[    7.078877] RIP: 0010:memset_orig+0x98/0xb0
[    7.078944] Code: 00 00 ff c9 48 89 07 48 8d 7f 08 75 f5 83 e2 07 74 0a ff ca 88 07 48 8d 7f 01 75 f6 4c 89 d0 c3 cc cc cc cc 48 83 fa 07 76 e3 <48> 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 6e ff
[    7.079163] RSP: 0018:ffffad81c04e3488 EFLAGS: 00010286
[    7.079242] RAX: 0000000000000000 RBX: ffffad81c04e34a0 RCX: 0000000000000000
[    7.079349] RDX: aaaaaaaaaaaaaaaa RSI: 0000000000000000 RDI: aaaaaaaaaaaaaaaa
[    7.079467] RBP: ffff969e37cc7e00 R08: ffff969d07acbcc0 R09: 0000000000000002
[    7.079576] R10: aaaaaaaaaaaaaaaa R11: ffffffffc04184c0 R12: 0000000000000000
[    7.079683] R13: 0000000000000070 R14: ffffad81c04e36d0 R15: ffffad81c04e35b8
[    7.079834] FS:  00007f0f6a3b5040(0000) GS:ffff969e37d00000(0000) knlGS:0000000000000000
[    7.079950] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    7.080048] CR2: 0000563b985ccff8 CR3: 000000010284a000 CR4: 00000000003506b0
/usr/sbin/sanity: line 22:   449 Segmentation fault      zpool create -O encryption=aes-256-gcm -O keylocation=prompt -O keyformat=passphrase tank raidz1 quizm0 quizm1 quizm2 quizm3
[INFO  tini (1)] Main child exited normally (with status '139')

It's a good trace, and likely pretty easy to track down the issue. I'm out of time tonight, so it'll have to wait until tomorrow. I'll see if I can get a test suite run against an all-Clang build as well, might shake a bit more out.

robn avatar May 15 '24 12:05 robn

perfect! That sounds great.

michael-brade avatar May 15 '24 15:05 michael-brade

Skimming over the mostly familiar code I'd say it chokes on

https://github.com/openzfs/zfs/blob/e675852bc1d50404fcbe4fa0e1f57b6c318e6349/module/icp/algs/modes/modes.c#L189-L192

The gcm code needs the gcm_ctx_t to be zeroed to work properly. This is assured in the common code path by using vmem_zalloc(). aes_encrypt_atomic() is the only consumer which allocates the gcm_ctx_t on the stack:

https://github.com/openzfs/zfs/blob/e675852bc1d50404fcbe4fa0e1f57b6c318e6349/module/icp/io/aes.c#L835

So my guess would be that Clang explicitly ignores zero initialization of stack variables if CONFIG_INIT_STACK_ALL_PATTERN is defined. We're encrypting so ctx->gcm_pt_buf (buffer to hold decrypted plaintext) must be NULL.

AFAIK CONFIG_INIT_STACK_ALL_PATTERN is a Clang specific setting. That would explain why GCC behaves correctly.

AttilaFueloep avatar May 15 '24 23:05 AttilaFueloep

@AttilaFueloep thanks for those pointers. I dug the rest of the way, and there's more bonkers than that!

tl;dr GCC and Clang treat {0} differently when initialising a union (as aes_ctx_t is). GCC will apply it to the maximal size of the union, while Clang will treat it as a single element initialiser, and only apply it to the first element. The rest of the union is considered "padding", and so will get whatever the "default" initialisation is.

So, even though both compilers support -ftrivial-auto-var-init=pattern, they yield different results here: GCC zeroes the entire aes_ctx_t, Clang only zeroes the first element, acu_ecb. gcm_ctx_t is the fifth element, acu_gcm, and gets the pattern initialiser.

(it's a bit beyond my ken, but it appears to be a difference of opinion between GCC and Clang on whether or not a union is an aggregate, as the initialiser rules are applied to aggregates).

More info:

  • https://reviews.llvm.org/D68115
  • https://discourse.llvm.org/t/ftrivial-auto-var-init-pattern-vs-uninitialized-and-union-initialization/53152

And some proof:

  • Clang: https://gcc.godbolt.org/z/qG7a5d6bz
  • GCC: https://gcc.godbolt.org/z/YE6eab38h

I will do something soon. Simply zeroing the entire thing with memset isn't enough, it then moves on to infinite loop in the SPL allocator. So I'll need to audit the codebase and make sure we're doing the right thing everywhere. Shouldn't be a big deal, fairly mechanical, just needs time and care because I only want to do this once and then never again.

robn avatar May 16 '24 01:05 robn

Alright, I've done the read, and I think robn/zfs@4b4495d "fixes" this by changing all the places where {0} (or equivalent) is used to initialize something on the stack that has a union somewhere in it.

I've put this on the test rig with a 6.7.12, all compiled with Clang 18. It passed the initial check of creating an encrypted dataset without crashing or hanging. I've left it running the full test suite overnight.

If it looks clean, I'll then go and look for prior art on how to do this properly. I want it to be hard/impossible to screw up, which means either a compiler warning or a lint of some kind to draw attention to it, and a clear pattern for "the right way".

robn avatar May 16 '24 12:05 robn

Test suite passed, which is a pleasing indicator of something.

robn avatar May 16 '24 23:05 robn