illumos-joyent
illumos-joyent copied to clipboard
NULL pointer dereference in fop_frlock
Had the following panic on a system running joyent_20180524T002819Z
.
The last thing I entered was pfiles
on a vmstat
process, but the system was also otherwise busy.
2019-03-11T22:19:16.139806+00:00 p unix: [ID 836849 kern.notice] #012#015panic[cpu5]/thread=fffffe59432a5060:
2019-03-11T22:19:16.139814+00:00 p genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe007c53e9d0 addr=18 occurred in module "genunix" due to a NULL pointer dereference
2019-03-11T22:19:16.139818+00:00 p unix: [ID 100000 kern.notice] #012
2019-03-11T22:19:16.139821+00:00 p unix: [ID 839527 kern.notice] vmstat:
2019-03-11T22:19:16.139825+00:00 p unix: [ID 753105 kern.notice] #pf Page fault
2019-03-11T22:19:16.139828+00:00 p unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x18
2019-03-11T22:19:16.139832+00:00 p unix: [ID 243837 kern.notice] pid=72777, pc=0xfffffffffbb611f2, sp=0xfffffe007c53eac0, eflags=0x10286
2019-03-11T22:19:16.139846+00:00 p unix: [ID 619397 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 1606f8<smep,osxsav,pcide,xmme,fxsr,pge,mce,pae,pse,de>
2019-03-11T22:19:16.139851+00:00 p unix: [ID 152204 kern.notice] cr2: 18
2019-03-11T22:19:16.139857+00:00 p unix: [ID 634440 kern.notice] cr3: 106be8e000
2019-03-11T22:19:16.139861+00:00 p unix: [ID 625715 kern.notice] cr8: 0
2019-03-11T22:19:16.139865+00:00 p unix: [ID 100000 kern.notice]
2019-03-11T22:19:16.139868+00:00 p unix: [ID 592667 kern.notice] #011rdi: 0 rsi: e rdx: fffffe007c53ed20
2019-03-11T22:19:16.139871+00:00 p unix: [ID 592667 kern.notice] #011rcx: 2005 r8: 25 r9: 0
2019-03-11T22:19:16.139880+00:00 p unix: [ID 592667 kern.notice] #011rax: fffffe59a8bbf710 rbx: 0 rbp: fffffe007c53eb50
2019-03-11T22:19:16.139884+00:00 p unix: [ID 592667 kern.notice] #011r10: fffffe592f9eeac0 r11: fffffe59ff01b600 r12: e
2019-03-11T22:19:16.139889+00:00 p unix: [ID 592667 kern.notice] #011r13: 8044af0 r14: fffffe5a2bce6c48 r15: 7fffffff
2019-03-11T22:19:16.139892+00:00 p unix: [ID 592667 kern.notice] #011fsb: 0 gsb: fffffe594281e000 ds: 4b
2019-03-11T22:19:16.139896+00:00 p unix: [ID 592667 kern.notice] #011 es: 4b fs: 0 gs: 0
2019-03-11T22:19:16.139899+00:00 p unix: [ID 592667 kern.notice] #011trp: e err: 0 rip: fffffffffbb611f2
2019-03-11T22:19:16.139903+00:00 p unix: [ID 592667 kern.notice] #011 cs: 30 rfl: 10286 rsp: fffffe007c53eac0
2019-03-11T22:19:16.139906+00:00 p unix: [ID 266532 kern.notice] #011 ss: 0
2019-03-11T22:19:16.139909+00:00 p unix: [ID 100000 kern.notice]
2019-03-11T22:19:16.139913+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e8c0 unix:real_mode_stop_cpu_stage2_end+bc1c ()
2019-03-11T22:19:16.139916+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e9c0 unix:trap+108e ()
2019-03-11T22:19:16.139920+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e9d0 unix:_cmntrap+e6 ()
2019-03-11T22:19:16.139931+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eb50 genunix:fop_frlock+32 ()
2019-03-11T22:19:16.139936+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eba0 dev:sdev_frlock+3e ()
2019-03-11T22:19:16.139940+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53ec40 genunix:fop_frlock+65 ()
2019-03-11T22:19:16.139943+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eeb0 genunix:fcntl+be9 ()
2019-03-11T22:19:16.139947+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53ef10 unix:brand_sys_syscall32+1aa ()
Sadly the dump does not seem usable:
# savecore -f vmdump.0
savecore: incomplete dump on dump device
savecore: System dump time: Mon Mar 11 21:40:54 2019
savecore: saving system crash dump in /var/crash/volatile/{unix,vmcore}.0
Constructing namelist /var/crash/volatile/unix.0
Constructing corefile /var/crash/volatile/vmcore.0
pfn 16266240 not found for as=fffffffffbc4b080, va=fffffe0000000000
pfn 16266241 not found for as=fffffffffbc4b080, va=fffffe0000001000
pfn 16266242 not found for as=fffffffffbc4b080, va=fffffe0000002000
pfn 16266243 not found for as=fffffffffbc4b080, va=fffffe0000003000
pfn 16266244 not found for as=fffffffffbc4b080, va=fffffe0000004000
pfn 16266245 not found for as=fffffffffbc4b080, va=fffffe0000005000
pfn 16266246 not found for as=fffffffffbc4b080, va=fffffe0000006000
pfn 16266247 not found for as=fffffffffbc4b080, va=fffffe0000007000
pfn 16266248 not found for as=fffffffffbc4b080, va=fffffe0000008000
pfn 16266249 not found for as=fffffffffbc4b080, va=fffffe0000009000
0:18 99% donesavecore: dsize 0xf0b != pagesize 0x1000
On 3/11/19 15:38 , Sebastian Wiedenroth wrote:
Had the following panic on a system running
joyent_20180524T002819Z
. The last thing I entered waspfiles
on avmstat
process, but the system was also otherwise busy.2019-03-11T22:19:16.139806+00:00 p unix: [ID 836849 kern.notice] #012#015panic[cpu5]/thread=fffffe59432a5060: 2019-03-11T22:19:16.139814+00:00 p genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe007c53e9d0 addr=18 occurred in module "genunix" due to a NULL pointer dereference 2019-03-11T22:19:16.139818+00:00 p unix: [ID 100000 kern.notice] #012 2019-03-11T22:19:16.139821+00:00 p unix: [ID 839527 kern.notice] vmstat: 2019-03-11T22:19:16.139825+00:00 p unix: [ID 753105 kern.notice] #pf Page fault 2019-03-11T22:19:16.139828+00:00 p unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x18 2019-03-11T22:19:16.139832+00:00 p unix: [ID 243837 kern.notice] pid=72777, pc=0xfffffffffbb611f2, sp=0xfffffe007c53eac0, eflags=0x10286 2019-03-11T22:19:16.139846+00:00 p unix: [ID 619397 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 1606f8<smep,osxsav,pcide,xmme,fxsr,pge,mce,pae,pse,de> 2019-03-11T22:19:16.139851+00:00 p unix: [ID 152204 kern.notice] cr2: 18 2019-03-11T22:19:16.139857+00:00 p unix: [ID 634440 kern.notice] cr3: 106be8e000 2019-03-11T22:19:16.139861+00:00 p unix: [ID 625715 kern.notice] cr8: 0 2019-03-11T22:19:16.139865+00:00 p unix: [ID 100000 kern.notice] 2019-03-11T22:19:16.139868+00:00 p unix: [ID 592667 kern.notice] #011rdi: 0 rsi: e rdx: fffffe007c53ed20 2019-03-11T22:19:16.139871+00:00 p unix: [ID 592667 kern.notice] #011rcx: 2005 r8: 25 r9: 0 2019-03-11T22:19:16.139880+00:00 p unix: [ID 592667 kern.notice] #011rax: fffffe59a8bbf710 rbx: 0 rbp: fffffe007c53eb50 2019-03-11T22:19:16.139884+00:00 p unix: [ID 592667 kern.notice] #011r10: fffffe592f9eeac0 r11: fffffe59ff01b600 r12: e 2019-03-11T22:19:16.139889+00:00 p unix: [ID 592667 kern.notice] #011r13: 8044af0 r14: fffffe5a2bce6c48 r15: 7fffffff 2019-03-11T22:19:16.139892+00:00 p unix: [ID 592667 kern.notice] #011fsb: 0 gsb: fffffe594281e000 ds: 4b 2019-03-11T22:19:16.139896+00:00 p unix: [ID 592667 kern.notice] #011 es: 4b fs: 0 gs: 0 2019-03-11T22:19:16.139899+00:00 p unix: [ID 592667 kern.notice] #011trp: e err: 0 rip: fffffffffbb611f2 2019-03-11T22:19:16.139903+00:00 p unix: [ID 592667 kern.notice] #011 cs: 30 rfl: 10286 rsp: fffffe007c53eac0 2019-03-11T22:19:16.139906+00:00 p unix: [ID 266532 kern.notice] #011 ss: 0 2019-03-11T22:19:16.139909+00:00 p unix: [ID 100000 kern.notice] 2019-03-11T22:19:16.139913+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e8c0 unix:real_mode_stop_cpu_stage2_end+bc1c () 2019-03-11T22:19:16.139916+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e9c0 unix:trap+108e () 2019-03-11T22:19:16.139920+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e9d0 unix:_cmntrap+e6 () 2019-03-11T22:19:16.139931+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eb50 genunix:fop_frlock+32 () 2019-03-11T22:19:16.139936+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eba0 dev:sdev_frlock+3e () 2019-03-11T22:19:16.139940+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53ec40 genunix:fop_frlock+65 () 2019-03-11T22:19:16.139943+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eeb0 genunix:fcntl+be9 () 2019-03-11T22:19:16.139947+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53ef10 unix:brand_sys_syscall32+1aa ()
Sadly the dump does not seem usable:
# savecore -f vmdump.0 savecore: incomplete dump on dump device savecore: System dump time: Mon Mar 11 21:40:54 2019 savecore: saving system crash dump in /var/crash/volatile/{unix,vmcore}.0 Constructing namelist /var/crash/volatile/unix.0 Constructing corefile /var/crash/volatile/vmcore.0 pfn 16266240 not found for as=fffffffffbc4b080, va=fffffe0000000000 pfn 16266241 not found for as=fffffffffbc4b080, va=fffffe0000001000 pfn 16266242 not found for as=fffffffffbc4b080, va=fffffe0000002000 pfn 16266243 not found for as=fffffffffbc4b080, va=fffffe0000003000 pfn 16266244 not found for as=fffffffffbc4b080, va=fffffe0000004000 pfn 16266245 not found for as=fffffffffbc4b080, va=fffffe0000005000 pfn 16266246 not found for as=fffffffffbc4b080, va=fffffe0000006000 pfn 16266247 not found for as=fffffffffbc4b080, va=fffffe0000007000 pfn 16266248 not found for as=fffffffffbc4b080, va=fffffe0000008000 pfn 16266249 not found for as=fffffffffbc4b080, va=fffffe0000009000 0:18 99% donesavecore: dsize 0xf0b != pagesize 0x1000
That's unfortunate. From looking here, I don't immediately see anything that would be related that would have fixed this between now and then. Can you check the size of the dump device? At least, a naive pfiles of a vmstat process doesn't immediately replicate this, unfortunately.
The fact that sdev is on the scene and we're trying to do something on the backend attribute vnode is a bit suspicious though. Based on the stack there, it feels like either the vnode_t or vfs_t pointer is bad.
Robert
Dump device size was at 2.52G, probably created by the old installer a long time ago. I've set it to 30G now as dumpadm -e
suggested ~15G which should reflect the current installers logic.
So if it happens again we'll hopefully get more out of it.
On 3/11/19 16:09 , Sebastian Wiedenroth wrote:
Dump device size was at 2.52G, probably created by the old installer a long time ago. I've set it to 30G now as
dumpadm -e
suggested ~15G which should reflect the current installers logic.So if it happens again we'll hopefully get more out of it.
If you have the disk space, I might size it to RAM. Sorry I don't have anything more immediately actionable here to suggest that we should look at.