illumos-joyent icon indicating copy to clipboard operation
illumos-joyent copied to clipboard

NULL pointer dereference in fop_frlock

Open wiedi opened this issue 5 years ago • 3 comments

Had the following panic on a system running joyent_20180524T002819Z. The last thing I entered was pfiles on a vmstat process, but the system was also otherwise busy.

2019-03-11T22:19:16.139806+00:00 p unix: [ID 836849 kern.notice] #012#015panic[cpu5]/thread=fffffe59432a5060: 
2019-03-11T22:19:16.139814+00:00 p genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe007c53e9d0 addr=18 occurred in module "genunix" due to a NULL pointer dereference
2019-03-11T22:19:16.139818+00:00 p unix: [ID 100000 kern.notice] #012
2019-03-11T22:19:16.139821+00:00 p unix: [ID 839527 kern.notice] vmstat: 
2019-03-11T22:19:16.139825+00:00 p unix: [ID 753105 kern.notice] #pf Page fault
2019-03-11T22:19:16.139828+00:00 p unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x18
2019-03-11T22:19:16.139832+00:00 p unix: [ID 243837 kern.notice] pid=72777, pc=0xfffffffffbb611f2, sp=0xfffffe007c53eac0, eflags=0x10286
2019-03-11T22:19:16.139846+00:00 p unix: [ID 619397 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 1606f8<smep,osxsav,pcide,xmme,fxsr,pge,mce,pae,pse,de>
2019-03-11T22:19:16.139851+00:00 p unix: [ID 152204 kern.notice] cr2: 18  
2019-03-11T22:19:16.139857+00:00 p unix: [ID 634440 kern.notice] cr3: 106be8e000  
2019-03-11T22:19:16.139861+00:00 p unix: [ID 625715 kern.notice] cr8: 0
2019-03-11T22:19:16.139865+00:00 p unix: [ID 100000 kern.notice] 
2019-03-11T22:19:16.139868+00:00 p unix: [ID 592667 kern.notice] #011rdi:                0 rsi:                e rdx: fffffe007c53ed20
2019-03-11T22:19:16.139871+00:00 p unix: [ID 592667 kern.notice] #011rcx:             2005  r8:               25  r9:                0
2019-03-11T22:19:16.139880+00:00 p unix: [ID 592667 kern.notice] #011rax: fffffe59a8bbf710 rbx:                0 rbp: fffffe007c53eb50
2019-03-11T22:19:16.139884+00:00 p unix: [ID 592667 kern.notice] #011r10: fffffe592f9eeac0 r11: fffffe59ff01b600 r12:                e
2019-03-11T22:19:16.139889+00:00 p unix: [ID 592667 kern.notice] #011r13:          8044af0 r14: fffffe5a2bce6c48 r15:         7fffffff
2019-03-11T22:19:16.139892+00:00 p unix: [ID 592667 kern.notice] #011fsb:                0 gsb: fffffe594281e000  ds:               4b
2019-03-11T22:19:16.139896+00:00 p unix: [ID 592667 kern.notice] #011 es:               4b  fs:                0  gs:                0
2019-03-11T22:19:16.139899+00:00 p unix: [ID 592667 kern.notice] #011trp:                e err:                0 rip: fffffffffbb611f2
2019-03-11T22:19:16.139903+00:00 p unix: [ID 592667 kern.notice] #011 cs:               30 rfl:            10286 rsp: fffffe007c53eac0
2019-03-11T22:19:16.139906+00:00 p unix: [ID 266532 kern.notice] #011 ss:                0
2019-03-11T22:19:16.139909+00:00 p unix: [ID 100000 kern.notice] 
2019-03-11T22:19:16.139913+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e8c0 unix:real_mode_stop_cpu_stage2_end+bc1c ()
2019-03-11T22:19:16.139916+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e9c0 unix:trap+108e ()
2019-03-11T22:19:16.139920+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e9d0 unix:_cmntrap+e6 ()
2019-03-11T22:19:16.139931+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eb50 genunix:fop_frlock+32 ()
2019-03-11T22:19:16.139936+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eba0 dev:sdev_frlock+3e ()
2019-03-11T22:19:16.139940+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53ec40 genunix:fop_frlock+65 ()
2019-03-11T22:19:16.139943+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eeb0 genunix:fcntl+be9 ()
2019-03-11T22:19:16.139947+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53ef10 unix:brand_sys_syscall32+1aa ()

Sadly the dump does not seem usable:

# savecore -f vmdump.0
savecore: incomplete dump on dump device
savecore: System dump time: Mon Mar 11 21:40:54 2019

savecore: saving system crash dump in /var/crash/volatile/{unix,vmcore}.0
Constructing namelist /var/crash/volatile/unix.0
Constructing corefile /var/crash/volatile/vmcore.0
pfn 16266240 not found for as=fffffffffbc4b080, va=fffffe0000000000
pfn 16266241 not found for as=fffffffffbc4b080, va=fffffe0000001000
pfn 16266242 not found for as=fffffffffbc4b080, va=fffffe0000002000
pfn 16266243 not found for as=fffffffffbc4b080, va=fffffe0000003000
pfn 16266244 not found for as=fffffffffbc4b080, va=fffffe0000004000
pfn 16266245 not found for as=fffffffffbc4b080, va=fffffe0000005000
pfn 16266246 not found for as=fffffffffbc4b080, va=fffffe0000006000
pfn 16266247 not found for as=fffffffffbc4b080, va=fffffe0000007000
pfn 16266248 not found for as=fffffffffbc4b080, va=fffffe0000008000
pfn 16266249 not found for as=fffffffffbc4b080, va=fffffe0000009000
 0:18  99% donesavecore: dsize 0xf0b != pagesize 0x1000

wiedi avatar Mar 11 '19 22:03 wiedi

On 3/11/19 15:38 , Sebastian Wiedenroth wrote:

Had the following panic on a system running joyent_20180524T002819Z. The last thing I entered was pfiles on a vmstat process, but the system was also otherwise busy.

2019-03-11T22:19:16.139806+00:00 p unix: [ID 836849 kern.notice] #012#015panic[cpu5]/thread=fffffe59432a5060: 
2019-03-11T22:19:16.139814+00:00 p genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fffffe007c53e9d0 addr=18 occurred in module "genunix" due to a NULL pointer dereference
2019-03-11T22:19:16.139818+00:00 p unix: [ID 100000 kern.notice] #012
2019-03-11T22:19:16.139821+00:00 p unix: [ID 839527 kern.notice] vmstat: 
2019-03-11T22:19:16.139825+00:00 p unix: [ID 753105 kern.notice] #pf Page fault
2019-03-11T22:19:16.139828+00:00 p unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x18
2019-03-11T22:19:16.139832+00:00 p unix: [ID 243837 kern.notice] pid=72777, pc=0xfffffffffbb611f2, sp=0xfffffe007c53eac0, eflags=0x10286
2019-03-11T22:19:16.139846+00:00 p unix: [ID 619397 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe>  cr4: 1606f8<smep,osxsav,pcide,xmme,fxsr,pge,mce,pae,pse,de>
2019-03-11T22:19:16.139851+00:00 p unix: [ID 152204 kern.notice] cr2: 18  
2019-03-11T22:19:16.139857+00:00 p unix: [ID 634440 kern.notice] cr3: 106be8e000  
2019-03-11T22:19:16.139861+00:00 p unix: [ID 625715 kern.notice] cr8: 0
2019-03-11T22:19:16.139865+00:00 p unix: [ID 100000 kern.notice] 
2019-03-11T22:19:16.139868+00:00 p unix: [ID 592667 kern.notice] #011rdi:                0 rsi:                e rdx: fffffe007c53ed20
2019-03-11T22:19:16.139871+00:00 p unix: [ID 592667 kern.notice] #011rcx:             2005  r8:               25  r9:                0
2019-03-11T22:19:16.139880+00:00 p unix: [ID 592667 kern.notice] #011rax: fffffe59a8bbf710 rbx:                0 rbp: fffffe007c53eb50
2019-03-11T22:19:16.139884+00:00 p unix: [ID 592667 kern.notice] #011r10: fffffe592f9eeac0 r11: fffffe59ff01b600 r12:                e
2019-03-11T22:19:16.139889+00:00 p unix: [ID 592667 kern.notice] #011r13:          8044af0 r14: fffffe5a2bce6c48 r15:         7fffffff
2019-03-11T22:19:16.139892+00:00 p unix: [ID 592667 kern.notice] #011fsb:                0 gsb: fffffe594281e000  ds:               4b
2019-03-11T22:19:16.139896+00:00 p unix: [ID 592667 kern.notice] #011 es:               4b  fs:                0  gs:                0
2019-03-11T22:19:16.139899+00:00 p unix: [ID 592667 kern.notice] #011trp:                e err:                0 rip: fffffffffbb611f2
2019-03-11T22:19:16.139903+00:00 p unix: [ID 592667 kern.notice] #011 cs:               30 rfl:            10286 rsp: fffffe007c53eac0
2019-03-11T22:19:16.139906+00:00 p unix: [ID 266532 kern.notice] #011 ss:                0
2019-03-11T22:19:16.139909+00:00 p unix: [ID 100000 kern.notice] 
2019-03-11T22:19:16.139913+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e8c0 unix:real_mode_stop_cpu_stage2_end+bc1c ()
2019-03-11T22:19:16.139916+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e9c0 unix:trap+108e ()
2019-03-11T22:19:16.139920+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53e9d0 unix:_cmntrap+e6 ()
2019-03-11T22:19:16.139931+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eb50 genunix:fop_frlock+32 ()
2019-03-11T22:19:16.139936+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eba0 dev:sdev_frlock+3e ()
2019-03-11T22:19:16.139940+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53ec40 genunix:fop_frlock+65 ()
2019-03-11T22:19:16.139943+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53eeb0 genunix:fcntl+be9 ()
2019-03-11T22:19:16.139947+00:00 p genunix: [ID 655072 kern.notice] fffffe007c53ef10 unix:brand_sys_syscall32+1aa ()

Sadly the dump does not seem usable:

# savecore -f vmdump.0
savecore: incomplete dump on dump device
savecore: System dump time: Mon Mar 11 21:40:54 2019

savecore: saving system crash dump in /var/crash/volatile/{unix,vmcore}.0
Constructing namelist /var/crash/volatile/unix.0
Constructing corefile /var/crash/volatile/vmcore.0
pfn 16266240 not found for as=fffffffffbc4b080, va=fffffe0000000000
pfn 16266241 not found for as=fffffffffbc4b080, va=fffffe0000001000
pfn 16266242 not found for as=fffffffffbc4b080, va=fffffe0000002000
pfn 16266243 not found for as=fffffffffbc4b080, va=fffffe0000003000
pfn 16266244 not found for as=fffffffffbc4b080, va=fffffe0000004000
pfn 16266245 not found for as=fffffffffbc4b080, va=fffffe0000005000
pfn 16266246 not found for as=fffffffffbc4b080, va=fffffe0000006000
pfn 16266247 not found for as=fffffffffbc4b080, va=fffffe0000007000
pfn 16266248 not found for as=fffffffffbc4b080, va=fffffe0000008000
pfn 16266249 not found for as=fffffffffbc4b080, va=fffffe0000009000
 0:18  99% donesavecore: dsize 0xf0b != pagesize 0x1000

That's unfortunate. From looking here, I don't immediately see anything that would be related that would have fixed this between now and then. Can you check the size of the dump device? At least, a naive pfiles of a vmstat process doesn't immediately replicate this, unfortunately.

The fact that sdev is on the scene and we're trying to do something on the backend attribute vnode is a bit suspicious though. Based on the stack there, it feels like either the vnode_t or vfs_t pointer is bad.

Robert

rmustacc avatar Mar 11 '19 22:03 rmustacc

Dump device size was at 2.52G, probably created by the old installer a long time ago. I've set it to 30G now as dumpadm -e suggested ~15G which should reflect the current installers logic.

So if it happens again we'll hopefully get more out of it.

wiedi avatar Mar 11 '19 23:03 wiedi

On 3/11/19 16:09 , Sebastian Wiedenroth wrote:

Dump device size was at 2.52G, probably created by the old installer a long time ago. I've set it to 30G now as dumpadm -e suggested ~15G which should reflect the current installers logic.

So if it happens again we'll hopefully get more out of it.

If you have the disk space, I might size it to RAM. Sorry I don't have anything more immediately actionable here to suggest that we should look at.

rmustacc avatar Mar 11 '19 23:03 rmustacc