drbd icon indicating copy to clipboard operation
drbd copied to clipboard

Kernel Panic

Open Smithx10 opened this issue 2 years ago • 1 comments

While working on linstor-gateway, I was able to panic drbd on kernel-lt and 5.14.21.

The change I made to Linstor-Gateway is removing the "must be offline" check located here: https://github.com/LINBIT/linstor-gateway/blob/master/pkg/nvmeof/nvmeof.go#L271-L274

		status := linstorcontrol.StatusFromResources(path, resourceDefinition, resourceGroup, resources)
		if status.Service == common.ServiceStateStarted {
			return nil, errors.New("cannot add volume while service is running")
		}

I'm not sure if this change is related or not, but I wouldn't expect this to result in a panic.

zfs:

[root@ac-1f-6b-9e-e5-46 zfs]# zfs --version
zfs-2.1.4-1
zfs-kmod-2.1.4-1

drbd:

[root@ac-1f-6b-9e-e5-46 zfs]# drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ 9aeb1059d37b92fec8db2b47e356c4e7fa030b64\ build\ by\ root@drbd-lsc-0\,\ 2022-06-23\ 05:01:03
DRBDADM_API_VERSION=2
DRBD_KERNEL_VERSION_CODE=0x090107
DRBD_KERNEL_VERSION=9.1.7
DRBDADM_VERSION_CODE=0x091500
DRBDADM_VERSION=9.21.0

drbd-reactor:

[root@ac-1f-6b-9e-e5-46 zfs]# drbd-reactor --version
drbd-reactor 0.7.0

Kernel: Linux ac-1f-6b-9e-e5-46 5.4.205-1.el8.elrepo.x86_64 #1 SMP Tue Jul 12 10:48:44 EDT 2022 x86_64 x86_64 x86_64 GNU/Linux

[ 1758.843242] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm before-resync-target exit code 0
[ 1758.865053] drbd milliman: Aborting cluster-wide state change 2530163159 (31ms) rv = -19
[ 1758.873892] drbd milliman: Preparing cluster-wide state change 247777449 (1->-1 3/1)
[ 1758.899606] drbd milliman ac-1f-6b-9e-e5-46: Aborting local state change 247777449 to yield to remote state change 1249144741.
[ 1758.912455] drbd milliman: Aborting cluster-wide state change 247777449 (38ms) rv = -19
[ 1758.921189] drbd milliman: Preparing cluster-wide state change 1328687619 (1->-1 3/1)
[ 1758.929707] drbd milliman: Aborting cluster-wide state change 1328687619 (9ms) rv = -19
[ 1758.938412] drbd milliman: Preparing cluster-wide state change 2976414762 (1->-1 3/1)
[ 1758.946915] drbd milliman: Aborting cluster-wide state change 2976414762 (9ms) rv = -19
[ 1758.967206] drbd milliman ac-1f-6b-9e-e5-46: Preparing remote state change 1249144741
[ 1759.000135] drbd milliman ac-1f-6b-9e-e5-46: Committing remote state change 1249144741 (primary_nodes=1)
[ 1759.010216] drbd milliman ac-1f-6b-9e-e5-46: peer( Secondary -> Primary )
[ 1759.069750] drbd milliman/1 drbd1001: disk( Outdated -> Inconsistent )
[ 1759.076966] drbd milliman/1 drbd1001 ac-1f-6b-a5-ab-ea: resync-susp( no -> connection dependency )
[ 1759.086600] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: repl( WFBitMapT -> SyncTarget )
[ 1759.095740] drbd milliman/0 drbd1000: disk( Outdated -> Inconsistent )
[ 1759.102917] drbd milliman/0 drbd1000 ac-1f-6b-a5-ab-ea: resync-susp( no -> connection dependency )
[ 1759.112497] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: repl( WFBitMapT -> SyncTarget )
[ 1759.121267] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: Began resync as SyncTarget (will sync 5066752 KB [1266688 bits set]).
[ 1759.133258] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: Began resync as SyncTarget (will sync 32768 KB [8192 bits set]).
[ 1759.133451] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: received new current UUID: DBD3CCFBFA3D8BAF weak_nodes=FFFFFFFFFFFFFFFC
[ 1759.263009] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: received new current UUID: 7AD2E749AAAFFC69 weak_nodes=FFFFFFFFFFFFFFFC
[ 1760.028877] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: Resync done (total 1 sec; paused 0 sec; 32768 K/sec)
[ 1760.039382] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: updated UUIDs DBD3CCFBFA3D8BAE:0000000000000000:C6FAEE622D6CFFFA:0000000000000000
[ 1760.053003] drbd milliman/0 drbd1000: disk( Inconsistent -> UpToDate )
[ 1760.060112] drbd milliman/0 drbd1000 ac-1f-6b-a5-ab-ea: resync-susp( connection dependency -> no )
[ 1760.069638] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: repl( SyncTarget -> Established )
[ 1760.079754] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm after-resync-target
[ 1760.091674] drbd milliman/0 drbd1000 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm after-resync-target exit code 0
[ 1814.063918] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: Resync done (total 54 sec; paused 0 sec; 93828 K/sec)
[ 1814.074464] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: updated UUIDs 7AD2E749AAAFFC68:0000000000000000:9656137FABC73162:0000000000000000
[ 1814.087919] drbd milliman/1 drbd1001: disk( Inconsistent -> UpToDate )
[ 1814.094958] drbd milliman/1 drbd1001 ac-1f-6b-a5-ab-ea: resync-susp( connection dependency -> no )
[ 1814.104426] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: repl( SyncTarget -> Established )
[ 1814.117246] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm after-resync-target
[ 1814.132686] drbd milliman/1 drbd1001 ac-1f-6b-9e-e5-46: helper command: /sbin/drbdadm after-resync-target exit code 0
[ 1832.336993] drbd demo0/3 drbd1005: meta-data IO uses: blk-bio
[ 1832.341362] drbd demo0/3 drbd1005: disabling discards due to peer capabilities
[ 1832.344636] drbd demo0: State change failed: In transient state, retry after next state change
[ 1832.360685] drbd demo0/3 drbd1005: Failed: disk( Diskless -> Attaching )
[ 1832.368103] drbd demo0/3 drbd1005 ac-1f-6b-9e-e5-46: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0
[ 1832.368109] drbd demo0: State change failed: In transient state, retry after next state change
[ 1832.382029] drbd demo0/3 drbd1005 ac-1f-6b-9e-e5-46: peer's exposed UUID: 0000000000000000
[ 1832.391198] drbd demo0/3 drbd1005: Failed: disk( Diskless -> Attaching )
[ 1832.407293] drbd demo0/3 drbd1005: disabling discards due to peer capabilities
[ 1832.415066] drbd demo0/3 drbd1005 ac-1f-6b-a5-ab-ea: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0
[ 1832.428722] drbd demo0/3 drbd1005 ac-1f-6b-a5-ab-ea: peer's exposed UUID: 0000000000000000
[ 1832.437526] drbd demo0/3 drbd1005 ac-1f-6b-a5-ab-ea: pdsk( DUnknown -> Diskless ) repl( Off -> Established )
[ 1832.447954] drbd demo0: State change failed: In transient state, retry after next state change
[ 1832.457104] drbd demo0/3 drbd1005: Failed: disk( Diskless -> Attaching )
[ 1832.464352] BUG: kernel NULL pointer dereference, address: 0000000000000010
[ 1832.464439] drbd demo0: State change failed: In transient state, retry after next state change
[ 1832.471880] #PF: supervisor read access in kernel mode
[ 1832.481074] drbd demo0/3 drbd1005: Failed: disk( Diskless -> Attaching )
[ 1832.486774] #PF: error_code(0x0000) - not-present page
[ 1832.499765] PGD 0 P4D 0
[ 1832.502896] Oops: 0000 [#1] SMP NOPTI
[ 1832.507123] CPU: 0 PID: 83920 Comm: drbd_r_demo0 Tainted: P           OE     5.4.205-1.el8.elrepo.x86_64 #1
[ 1832.517442] Hardware name: Supermicro SYS-1029U-TN10RT/X11DPU, BIOS 3.1 04/29/2019
[ 1832.525626] RIP: 0010:drbd_determine_dev_size+0x5a/0x520 [drbd]
[ 1832.532118] Code: 00 48 89 44 24 78 31 c0 e8 73 e1 ff ff 48 c7 c6 b0 d4 8c c0 48 89 df e8 a4 7d fe ff 48 89 44 24 08 48 85 c0 0f 84 4a 04 00 00 <49> 8b 47 10 4d 8b 77 18 48 89 04 24 41 8b 47 48 89 44 24 18 49 8b
[ 1832.551934] RSP: 0018:ffffaaaff1587d00 EFLAGS: 00010286
[ 1832.557700] RAX: ffff9c7326224000 RBX: ffff9c72d0a1e000 RCX: 0000000000000000
[ 1832.565368] RDX: 0000000000000001 RSI: ffffffffc08cd4b0 RDI: ffff9c72d0a1e000
[ 1832.573019] RBP: 0000000000000000 R08: 0000000000000332 R09: 000000000002ea40
[ 1832.580667] R10: 0000000000008905 R11: 0000000000004482 R12: 0000000000000000
[ 1832.588284] R13: 0000000000000000 R14: ffff9c734142d000 R15: 0000000000000000
[ 1832.595889] FS:  0000000000000000(0000) GS:ffff9c1380600000(0000) knlGS:0000000000000000
[ 1832.604466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1832.610682] CR2: 0000000000000010 CR3: 000000a9a340a003 CR4: 00000000007606f0
[ 1832.618287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1832.625903] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1832.633525] PKRU: 55555554
[ 1832.636734] Call Trace:
[ 1832.639648]  ? printk+0x58/0x6f
[ 1832.643241]  receive_state+0x5f7/0x1040 [drbd]
[ 1832.648125]  ? drbd_recv+0x49/0x200 [drbd]
[ 1832.652692]  ? decode_header+0x17/0x130 [drbd]
[ 1832.657606]  ? _get_ldev_if_state.part.51+0xd0/0xd0 [drbd]
[ 1832.663555]  drbd_receiver+0x5a6/0x7f0 [drbd]
[ 1832.668351]  ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
[ 1832.674534]  drbd_thread_setup+0x5e/0x160 [drbd]
[ 1832.679594]  ? __drbd_next_peer_device_ref+0x140/0x140 [drbd]
[ 1832.685790]  kthread+0x10c/0x130
[ 1832.689467]  ? kthread_park+0x80/0x80
[ 1832.693578]  ret_from_fork+0x1f/0x40
[ 1832.697591] Modules linked in: zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) drbd_transport_tcp(OE) drbd(OE) bcache(E) crc64(E) dm_cache(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) dm_writecache(E) nvme_rdma(E) nvmet_rdma(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) 8021q(E) garp(E) mrp(E) stp(E) llc(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) iTCO_vendor_support(E) skx_edac(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) rfkill(E) ghash_clmulni_intel(E) rapl(E) intel_cstate(E) mei_me(E) ipmi_ssif(E) sr_mod(E) cdrom(E) intel_uncore(E) pcspkr(E) sunrpc(E) sg(E) joydev(E) i2c_i801(E) lpc_ich(E) mei(E) ioatdma(E) ipmi_si(E) acpi_power_meter(E) acpi_pad(E) vfat(E) fat(E) dm_mod(E) uas(E) usb_storage(E) xfs(E) ast(E) i2c_algo_bit(E) libcrc32c(E) drm_vram_helper(E) ttm(E) nvmet_tcp(E) drm_kms_helper(E) ixgbe(E) nvmet(E)
[ 1832.697619]  syscopyarea(E) ahci(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) nvme_tcp(E) libahci(E) nvme_fabrics(E) crc32c_intel(E) drm(E) mdio(E) libata(E) dca(E) wmi(E) nvme(E) nvme_core(E) ipmi_devintf(E) ipmi_msghandler(E)
[ 1832.808915] CR2: 0000000000000010
[ 1832.812751] ---[ end trace 1dbb53d7f2280dec ]---
[ 1832.876741] RIP: 0010:drbd_determine_dev_size+0x5a/0x520 [drbd]
[ 1832.883115] Code: 00 48 89 44 24 78 31 c0 e8 73 e1 ff ff 48 c7 c6 b0 d4 8c c0 48 89 df e8 a4 7d fe ff 48 89 44 24 08 48 85 c0 0f 84 4a 04 00 00 <49> 8b 47 10 4d 8b 77 18 48 89 04 24 41 8b 47 48 89 44 24 18 49 8b
[ 1832.902799] RSP: 0018:ffffaaaff1587d00 EFLAGS: 00010286
[ 1832.908514] RAX: ffff9c7326224000 RBX: ffff9c72d0a1e000 RCX: 0000000000000000
[ 1832.916117] RDX: 0000000000000001 RSI: ffffffffc08cd4b0 RDI: ffff9c72d0a1e000
[ 1832.923724] RBP: 0000000000000000 R08: 0000000000000332 R09: 000000000002ea40
[ 1832.931319] R10: 0000000000008905 R11: 0000000000004482 R12: 0000000000000000
[ 1832.938913] R13: 0000000000000000 R14: ffff9c734142d000 R15: 0000000000000000
[ 1832.946497] FS:  0000000000000000(0000) GS:ffff9c1380600000(0000) knlGS:0000000000000000
[ 1832.955026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1832.961234] CR2: 0000000000000010 CR3: 000000a9a340a003 CR4: 00000000007606f0
[ 1832.968843] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1832.976426] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1832.983991] PKRU: 55555554
[ 1832.987134] Kernel panic - not syncing: Fatal exception
[ 1832.992928] Kernel Offset: 0x22800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1833.063038] ---[ end Kernel panic - not syncing: Fatal exception ]---

Kernel: 5.14.21:

[  350.867400] drbd milliman/0 drbd1000 ac-1f-6b-a4-df-ee: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[  350.877981] drbd milliman/1 drbd1001: quorum( no -> yes )
[  350.883886] drbd milliman/1 drbd1001 ac-1f-6b-a4-df-ee: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[ 9754.631374] drbd demo0: Starting worker thread (from drbdsetup [5294])
[ 9754.641352] drbd demo0 ac-1f-6b-a4-df-ee: Starting sender thread (from drbdsetup [5302])
[ 9754.663695] drbd demo0/0 drbd1002: meta-data IO uses: blk-bio
[ 9754.670334] drbd demo0/0 drbd1002: disk( Diskless -> Attaching )
[ 9754.676960] drbd demo0/0 drbd1002: Maximum number of peer devices = 7
[ 9754.684075] drbd demo0: Method to ensure write ordering: flush
[ 9754.690514] drbd demo0/0 drbd1002: drbd_bm_resize called with capacity == 131080
[ 9754.698516] drbd demo0/0 drbd1002: resync bitmap: bits=16385 words=1799 pages=4
[ 9754.706406] drbd1002: detected capacity change from 0 to 131080
[ 9754.712915] drbd demo0/0 drbd1002: size = 64 MB (65540 KB)
[ 9754.719120] drbd demo0/0 drbd1002: recounting of set bits took additional 0ms
[ 9754.726831] drbd demo0/0 drbd1002: disk( Attaching -> UpToDate )
[ 9754.733397] drbd demo0/0 drbd1002: attached to current UUID: ECDCAF858EE6D814
[ 9754.741100] drbd demo0/0 drbd1002: size = 64 MB (65540 KB)
[ 9754.774618] drbd demo0/1 drbd1003: meta-data IO uses: blk-bio
[ 9754.781254] drbd demo0/1 drbd1003: disk( Diskless -> Attaching )
[ 9754.787826] drbd demo0/1 drbd1003: Maximum number of peer devices = 7
[ 9754.794870] drbd demo0/1 drbd1003: drbd_bm_resize called with capacity == 209715208
[ 9754.820604] drbd demo0/1 drbd1003: resync bitmap: bits=26214401 words=2867207 pages=5601
[ 9754.829279] drbd1003: detected capacity change from 0 to 209715208
[ 9754.836024] drbd demo0/1 drbd1003: size = 100 GB (104857604 KB)
[ 9754.879087] drbd demo0/1 drbd1003: recounting of set bits took additional 13ms
[ 9754.895230] drbd demo0/1 drbd1003: disk( Attaching -> UpToDate )
[ 9754.901761] drbd demo0/1 drbd1003: attached to current UUID: 6048848BCEFDCF0A
[ 9754.909484] drbd demo0/1 drbd1003: size = 100 GB (104857604 KB)
[ 9754.911097] drbd demo0 ac-1f-6b-a4-df-ee: conn( StandAlone -> Unconnected )
[ 9754.923894] drbd demo0 ac-1f-6b-a4-df-ee: Starting receiver thread (from drbd_w_demo0 [5295])
[ 9754.933155] drbd demo0 ac-1f-6b-a4-df-ee: conn( Unconnected -> Connecting )
[ 9755.446373] drbd demo0 ac-1f-6b-a4-df-ee: Handshake to peer 0 successful: Agreed network protocol version 121
[ 9755.456896] drbd demo0 ac-1f-6b-a4-df-ee: Feature flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[ 9755.470894] drbd demo0 ac-1f-6b-a4-df-ee: Peer authenticated using 20 bytes HMAC
[ 9755.478807] drbd demo0 ac-1f-6b-a4-df-ee: Starting ack_recv thread (from drbd_r_demo0 [5444])
[ 9755.521518] drbd demo0 ac-1f-6b-a4-df-ee: Preparing remote state change 1818502727
[ 9755.543252] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: drbd_sync_handshake:
[ 9755.550472] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: self ECDCAF858EE6D814:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[ 9755.564013] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: peer E62158D3D1AFA478:ECDCAF858EE6D814:0000000000000000:0000000000000000 bits:16385 flags:20
[ 9755.577934] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: uuid_compare()=target-use-bitmap by rule=bitmap-peer
[ 9755.602244] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: drbd_sync_handshake:
[ 9755.609477] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: self 6048848BCEFDCF0A:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[ 9755.623038] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: peer 6048848BCEFDCF0A:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:20
[ 9755.636634] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: uuid_compare()=no-sync by rule=both-off
[ 9755.683092] drbd demo0 ac-1f-6b-a4-df-ee: Committing remote state change 1818502727 (primary_nodes=0)
[ 9755.692790] drbd demo0 ac-1f-6b-a4-df-ee: conn( Connecting -> Connected ) peer( Unknown -> Secondary )
[ 9755.702545] drbd demo0/0 drbd1002: disk( UpToDate -> Outdated )
[ 9755.708922] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
[ 9755.719032] drbd demo0/1 drbd1003 ac-1f-6b-a4-df-ee: pdsk( DUnknown -> UpToDate ) repl( Off -> Established )
[ 9755.738450] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: receive bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 99.0%
[ 9755.760668] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 21(1), total 21; compression: 99.0%
[ 9755.782646] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: helper command: /sbin/drbdadm before-resync-target
[ 9755.800952] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: helper command: /sbin/drbdadm before-resync-target exit code 0
[ 9755.820258] drbd demo0/0 drbd1002: disk( Outdated -> Inconsistent )
[ 9755.827064] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: repl( WFBitMapT -> SyncTarget )
[ 9755.835374] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: Began resync as SyncTarget (will sync 65540 KB [16385 bits set]).
[ 9756.385396] drbd demo0 ac-1f-6b-a4-df-ee: Preparing remote state change 3350157584
[ 9756.425889] drbd demo0 ac-1f-6b-a4-df-ee: Committing remote state change 3350157584 (primary_nodes=1)
[ 9756.435640] drbd demo0 ac-1f-6b-a4-df-ee: peer( Secondary -> Primary )
[ 9758.280497] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: Resync done (total 2 sec; paused 0 sec; 32768 K/sec)
[ 9758.290632] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: updated UUIDs E62158D3D1AFA478:0000000000000000:0000000000000000:0000000000000000
[ 9758.303782] drbd demo0/0 drbd1002: disk( Inconsistent -> UpToDate )
[ 9758.310611] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: repl( SyncTarget -> Established )
[ 9758.319908] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: helper command: /sbin/drbdadm after-resync-target
[ 9758.330851] drbd demo0/0 drbd1002 ac-1f-6b-a4-df-ee: helper command: /sbin/drbdadm after-resync-target exit code 0
[ 9778.139037] drbd demo0/2 drbd1004: meta-data IO uses: blk-bio
[ 9778.145730] drbd demo0: State change failed: In transient state, retry after next state change
[ 9778.154976] drbd demo0/2 drbd1004: Failed: disk( Diskless -> Attaching )
[ 9778.162307] drbd demo0: State change failed: In transient state, retry after next state change
[ 9778.171518] drbd demo0/2 drbd1004: Failed: disk( Diskless -> Attaching )
[ 9778.472135] drbd demo0/2 drbd1004: disabling discards due to peer capabilities
[ 9778.480103] drbd demo0/2 drbd1004 ac-1f-6b-a4-df-ee: self 0000000000000000:0000000000000000:0000000000000000:0000000000000000 bits:0 flags:0
[ 9778.493887] drbd demo0/2 drbd1004 ac-1f-6b-a4-df-ee: peer's exposed UUID: 0000000000000000
[ 9778.511106] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 9778.518618] #PF: supervisor read access in kernel mode
[ 9778.524305] #PF: error_code(0x0000) - not-present page
[ 9778.529987] PGD 0 P4D 0
[ 9778.533056] Oops: 0000 [#1] SMP NOPTI
[ 9778.537250] CPU: 1 PID: 5444 Comm: drbd_r_demo0 Tainted: P S         OE     5.14.21 #1
[ 9778.545682] Hardware name: Supermicro SYS-1029U-TN10RT/X11DPU, BIOS 3.1 04/29/2019
[ 9778.553758] RIP: 0010:drbd_determine_dev_size+0x5a/0x550 [drbd]
[ 9778.560208] Code: 00 48 89 44 24 78 31 c0 e8 13 e1 ff ff 48 c7 c6 f0 04 2c c1 48 89 df e8 14 72 fe ff 48 89 44 24 10 48 85 c0 0f 84 73 04 00 00 <49> 8b 47 18 48 89 04 24 49 8b 47 10 48 89 44 24 08 41 8b 47 48 89
[ 9778.580031] RSP: 0018:ffffb2d58151bd00 EFLAGS: 00010286
[ 9778.585772] RAX: ffff89be458a5000 RBX: ffff89be9365c000 RCX: 0000000000000000
[ 9778.593410] RDX: 0000000000000001 RSI: ffffffffc12c04f0 RDI: ffff89be9365c000
[ 9778.601035] RBP: 0000000000000000 R08: 0000000000000140 R09: 0000000000000180
[ 9778.608656] R10: 0000000000000140 R11: 0000000000004afc R12: 0000000000000000
[ 9778.616268] R13: ffff89bfb7160800 R14: 0000000000000000 R15: 0000000000000000
[ 9778.623881] FS:  0000000000000000(0000) GS:ffff89bcc0840000(0000) knlGS:0000000000000000
[ 9778.632444] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9778.638660] CR2: 0000000000000018 CR3: 0000007dc6e0a006 CR4: 00000000007706e0
[ 9778.646293] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9778.653898] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9778.661486] PKRU: 55555554
[ 9778.664647] Call Trace:
[ 9778.667539]  <TASK>
[ 9778.670062]  ? vprintk_emit+0x128/0x270
[ 9778.674333]  ? printk+0x58/0x6f
[ 9778.677900]  receive_state+0x5f5/0x1080 [drbd]
[ 9778.682779]  ? receive_uuids110+0x570/0x570 [drbd]
[ 9778.687996]  ? drbd_recv+0x46/0x220 [drbd]
[ 9778.692510]  ? decode_header+0x17/0x140 [drbd]
[ 9778.697368]  ? receive_uuids110+0x570/0x570 [drbd]
[ 9778.702565]  drbd_receiver+0x598/0x830 [drbd]
[ 9778.707327]  drbd_thread_setup+0x76/0x1b0 [drbd]
[ 9778.712347]  ? __drbd_next_peer_device_ref+0x1a0/0x1a0 [drbd]
[ 9778.718485]  kthread+0x118/0x140
[ 9778.722092]  ? set_kthread_struct+0x40/0x40
[ 9778.726649]  ret_from_fork+0x1f/0x30
[ 9778.730601]  </TASK>
[ 9778.733164] Modules linked in: drbd_transport_tcp(OE) drbd(OE) bcache crc64 dm_cache dm_persistent_data dm_bio_prison dm_bufio dm_writecache nvme_rdma nvmet_rdma rdma_cm iw_cm ib_cm ib_core dm_mod 8021q garp mrp stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate ipmi_ssif vfat mei_me fat i2c_i801 intel_uncore joydev pcspkr mei acpi_ipmi ioatdma i2c_smbus lpc_ich ipmi_si acpi_power_meter acpi_pad binfmt_misc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) nvmet_tcp nvmet nvme_tcp nvme_fabrics xfs libcrc32c sd_mod sg ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfilblt fb_sys_fops drm_ttm_helper ttm drm ixgbe ahci libahci nvme uas nvme_core libata crc32c_intel usb_storage mdio t10_pi dca wmi ipmi_devintf ipmi_msghandler
[ 9778.823619] CR2: 0000000000000018
[ 9778.827377] ---[ end trace 09a2a2ea66dcaf4b ]---
[ 9778.895569] RIP: 0010:drbd_determine_dev_size+0x5a/0x550 [drbd]
[ 9778.901918] Code: 00 48 89 44 24 78 31 c0 e8 13 e1 ff ff 48 c7 c6 f0 04 2c c1 48 89 df e8 14 72 fe ff 48 89 44 24 10 48 85 c0 0f 84 73 04 00 00 <49> 8b 47 18 48 89 04 24 49 8b 47 10 48 89 44 24 08 41 8b 47 48 89
[ 9778.921526] RSP: 0018:ffffb2d58151bd00 EFLAGS: 00010286
[ 9778.927182] RAX: ffff89be458a5000 RBX: ffff89be9365c000 RCX: 0000000000000000
[ 9778.934752] RDX: 0000000000000001 RSI: ffffffffc12c04f0 RDI: ffff89be9365c000
[ 9778.942314] RBP: 0000000000000000 R08: 0000000000000140 R09: 0000000000000180
[ 9778.949877] R10: 0000000000000140 R11: 0000000000004afc R12: 0000000000000000
[ 9778.957422] R13: ffff89bfb7160800 R14: 0000000000000000 R15: 0000000000000000
[ 9778.964962] FS:  0000000000000000(0000) GS:ffff89bcc0840000(0000) knlGS:0000000000000000
[ 9778.973455] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9778.979608] CR2: 0000000000000018 CR3: 0000007dc6e0a006 CR4: 00000000007706e0
[ 9778.987151] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9778.994689] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9779.002226] PKRU: 55555554
[ 9779.005349] Kernel panic - not syncing: Fatal exception
[ 9779.011059] Kernel Offset: 0x36600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 9779.064919] ---[ end Kernel panic - not syncing: Fatal exception ]---

Smithx10 avatar Jul 15 '22 12:07 Smithx10

Thanks for the report. The call from receive_state to drbd_determine_dev_size is new in drbd-9.1.7, so it looks like that introduced a bug. I'll look into it.

JoelColledge avatar Aug 02 '22 09:08 JoelColledge

Fixed by https://github.com/LINBIT/drbd/commit/83cd5b82d1ac3683ccbb589181c0decdeef2898f.

JoelColledge avatar Aug 23 '22 10:08 JoelColledge