riscv-pk
riscv-pk copied to clipboard
How to get BBL to trap FP instructions on rv64imac hardware?
I have compiled busybox using "-march=rv64imafdc -mabi=lp64d", and used it as the basis for an initramfs.cpio filesystem for a Linux kernel also built with CONFIG_FPU=y. This kernel is then used as the payload for BBL, configured with "--with-arch=rv64imac". I see libsoftfloat.a being built, and the final riscv64-unknown-linux-gnu-gcc invocation has "-lsoftfloat" as one of the arguments.
Linux does boot on this system all the way to the point where it's trying to start init (i.e., busybox), when it errors out with an invalid opcode:
[ 26.391786] Run /init as init process
[ 26.553902] init[1]: unhandled signal 4 code 0x1 at 0x00000000000103c8 in busybox[10000+121000]
[ 26.572140] CPU: 0 PID: 1 Comm: init Not tainted 5.2.0-rc6-00017-g6e58d8172a8c-dirty #86
[ 26.585592] sepc: 00000000000103c8 ra : 000000000006c3a2 sp : 0000003fffe46d30
[ 26.597564] gp : 00000000001341d8 tp : 0000000000166700 t0 : 0000000000135000
[ 26.609566] t1 : 000000000000009f t2 : 0000000000000000 s0 : 000000000006c704
[ 26.622896] s1 : 000000000006c794 a0 : 0000003fffe46d58 a1 : 0000000000000000
[ 26.635032] a2 : 0000003fffe46e88 a3 : 0000000000000001 a4 : 00000000000b2eea
[ 26.647028] a5 : 0000000000165510 a6 : 0000000000143230 a7 : 00000000001326f8
[ 26.659056] s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
[ 26.672018] s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
[ 26.684108] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
[ 26.696288] s11: 0000000000000000 t3 : 0000000000167200 t4 : 000000000000000c
[ 26.708436] t5 : 0000000000000047 t6 : 0000000000000000
[ 26.717782] sstatus: 0000000200000020 sbadaddr: 000000000000b920 scause: 0000000000000002
[ 26.827572] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[ 26.838846] CPU: 0 PID: 1 Comm: init Not tainted 5.2.0-rc6-00017-g6e58d8172a8c-dirty #86
[ 26.851380] Call Trace:
[ 26.856036] [<ffffffe0000e703e>] walk_stackframe+0x0/0xa0
[ 26.864566] [<ffffffe0000e719e>] show_stack+0x2a/0x34
[ 26.872838] [<ffffffe0004509c8>] dump_stack+0x20/0x28
[ 26.880822] [<ffffffe0000ea630>] panic+0xe2/0x246
[ 26.888296] [<ffffffe0000ec54e>] do_exit+0x766/0x784
[ 26.896234] [<ffffffe0000ec5be>] do_group_exit+0x22/0x6e
[ 26.904826] [<ffffffe0000f4092>] get_signal+0x132/0x5f4
[ 26.912932] [<ffffffe0000e690a>] do_notify_resume+0x64/0x334
[ 26.921894] [<ffffffe0000e5e84>] ret_from_exception+0x0/0xc
[ 26.930542] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 ]---
The b920 opcode corresponds to "fsd fs0,112(a0)" at 0x103c8 in busybox:
0000000000010394 <__sigsetjmp>:
10394: 00153023 sd ra,0(a0)
10398: e500 sd s0,8(a0)
1039a: e904 sd s1,16(a0)
1039c: 01253c23 sd s2,24(a0)
103a0: 03353023 sd s3,32(a0)
103a4: 03453423 sd s4,40(a0)
103a8: 03553823 sd s5,48(a0)
103ac: 03653c23 sd s6,56(a0)
103b0: 05753023 sd s7,64(a0)
103b4: 05853423 sd s8,72(a0)
103b8: 05953823 sd s9,80(a0)
103bc: 05a53c23 sd s10,88(a0)
103c0: 07b53023 sd s11,96(a0)
103c4: 06253423 sd sp,104(a0)
103c8: b920 fsd fs0,112(a0)
103ca: bd24 fsd fs1,120(a0)
103cc: 09253027 fsd fs2,128(a0)
103d0: 09353427 fsd fs3,136(a0)
103d4: 09453827 fsd fs4,144(a0)
103d8: 09553c27 fsd fs5,152(a0)
103dc: 0b653027 fsd fs6,160(a0)
103e0: 0b753427 fsd fs7,168(a0)
103e4: 0b853827 fsd fs8,176(a0)
103e8: 0b953c27 fsd fs9,184(a0)
103ec: 0da53027 fsd fs10,192(a0)
103f0: 0db53427 fsd fs11,200(a0)
103f4: 1df5f06f j 6fdd2 <__sigjmp_save>
I'm wondering why sstatus doesn't have FS set, and what I'd have to do to get BBL to tell Linux (running in S mode) that there's floating point support in machine (M) mode ?
Edit: Oh, and if I build my kernel without "CONFIG_FPU", and build busybox with only "-march=rv64imac -mabi=lp64", it boots all the way into a busybox shell (ash), so this is strictly a question of getting BBL to actually advertise FP capabilities in M mode, as far as I can tell.
As it turns out, bbl appears to pass through the unmodified "rv64imac" CPU ISA string to Linux, who then assumes there is no FP available, and kills any process that attempts it without ever punting to M-mode. Lying to Linux by claiming "rv64imafdc" in the DTB gets it working. I'm looking at where BBL should do the "s/rv64imac/rv64imafdc/" edit to the DTB before starting its payload, and why it doesn't do so in my situation. I might have a patch soon if nobody beats me to it :)
@gsomlo I met same error with you,
[ 20.880859] init[1]: unhandled signal 4 code 0x1 at 0x0000003fbb501bb0 in ld-2.30.so[3fbb4f1000+17000]
[ 20.902984] CPU: 0 PID: 1 Comm: init Not tainted 5.7.0+ #1
[ 20.915893] epc: 0000003fbb501bb0 ra : 0000003fbb500cde sp : 0000003fffd8a480
[ 20.932342] gp : ffffffe000a2cf50 tp : 0000000000000000 t0 : 0000000000000000
[ 20.948211] t1 : 0000003fbb4f1e7c t2 : 000000006fffffff s0 : 0000000000000000
[ 20.964782] s1 : 0000003fffd8a620 a0 : 0000003fffd8a4b8 a1 : 0000000000000000
[ 20.981353] a2 : 0000003fffd8a6d8 a3 : 0000003fffd8a6c0 a4 : 0000003fffd8a4a8
[ 20.997222] a5 : 0000003fbb50a0e0 a6 : 7efefefefefefeff a7 : 24160a4b570a5248
[ 21.013793] s2 : 00000000000138c9 s3 : 00000000000bae10 s4 : 0000003fbb50a160
[ 21.029724] s5 : 0000000000000000 s6 : 0000003fbb50a160 s7 : 0000000000000000
[ 21.046295] s8 : 0000000000000000 s9 : 0000003fbb509ff8 s10: 0000003fffd8a6c0
[ 21.062927] s11: 0000003fffd8a6d8 t3 : 0000003fbb500cb0 t4 : 0000000000000004
[ 21.078735] t5 : 0000000000000004 t6 : 0000000000000004
[ 21.091796] status: 0000000200000020 badaddr: 000000000000b920 cause: 0000000000000002
Did you know how to view the disassemble code of at 0x0000003fbb501bb0 in ld-2.30.so[3fbb4f1000+17000]
, I tried to use objdump to disassemble the ld-2.30.so, but I can't get the disassemble code at 0x0000003fbb501bb0 address.
Thanks for any input you can provide.
On Tue, Nov 17, 2020 at 12:49:27AM -0800, Huaqi Fang wrote:
Did you know how to view the disassemble code of at 0x0000003fbb501bb0 in ld-2.30.so[3fbb4f1000+17000], I tried to use objdump to disassemble the ld-2.30.so, but I can't get the disassemble code at 0x0000003fbb501bb0 address.
Not sure whether this really is the same problem or not -- depends on the faulting instruction you're trying to disassemble. Not sure what your specifics are -- off the top of my head, one way for objdump to not work properly is if you're using the x86 native objdump on cross-compiled code for a different architecture -- make sure you're using the objdump associated with your toolchain for the target CPU arch.
My problem was specific to floating-point opcodes on RV64GC, and the solution is outlined in comment https://github.com/riscv/riscv-pk/issues/166#issuecomment-508222340
HTH, --G
OK, thank you @gsomlo , I will try to use riscv objdump to check the instruction mentioned in the offset.