riscv-pk icon indicating copy to clipboard operation
riscv-pk copied to clipboard

How to get BBL to trap FP instructions on rv64imac hardware?

Open gsomlo opened this issue 5 years ago • 4 comments

I have compiled busybox using "-march=rv64imafdc -mabi=lp64d", and used it as the basis for an initramfs.cpio filesystem for a Linux kernel also built with CONFIG_FPU=y. This kernel is then used as the payload for BBL, configured with "--with-arch=rv64imac". I see libsoftfloat.a being built, and the final riscv64-unknown-linux-gnu-gcc invocation has "-lsoftfloat" as one of the arguments.

Linux does boot on this system all the way to the point where it's trying to start init (i.e., busybox), when it errors out with an invalid opcode:

[   26.391786] Run /init as init process
[   26.553902] init[1]: unhandled signal 4 code 0x1 at 0x00000000000103c8 in busybox[10000+121000]
[   26.572140] CPU: 0 PID: 1 Comm: init Not tainted 5.2.0-rc6-00017-g6e58d8172a8c-dirty #86
[   26.585592] sepc: 00000000000103c8 ra : 000000000006c3a2 sp : 0000003fffe46d30
[   26.597564]  gp : 00000000001341d8 tp : 0000000000166700 t0 : 0000000000135000
[   26.609566]  t1 : 000000000000009f t2 : 0000000000000000 s0 : 000000000006c704
[   26.622896]  s1 : 000000000006c794 a0 : 0000003fffe46d58 a1 : 0000000000000000
[   26.635032]  a2 : 0000003fffe46e88 a3 : 0000000000000001 a4 : 00000000000b2eea
[   26.647028]  a5 : 0000000000165510 a6 : 0000000000143230 a7 : 00000000001326f8
[   26.659056]  s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
[   26.672018]  s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
[   26.684108]  s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
[   26.696288]  s11: 0000000000000000 t3 : 0000000000167200 t4 : 000000000000000c
[   26.708436]  t5 : 0000000000000047 t6 : 0000000000000000
[   26.717782] sstatus: 0000000200000020 sbadaddr: 000000000000b920 scause: 0000000000000002
[   26.827572] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
[   26.838846] CPU: 0 PID: 1 Comm: init Not tainted 5.2.0-rc6-00017-g6e58d8172a8c-dirty #86
[   26.851380] Call Trace:
[   26.856036] [<ffffffe0000e703e>] walk_stackframe+0x0/0xa0
[   26.864566] [<ffffffe0000e719e>] show_stack+0x2a/0x34
[   26.872838] [<ffffffe0004509c8>] dump_stack+0x20/0x28
[   26.880822] [<ffffffe0000ea630>] panic+0xe2/0x246
[   26.888296] [<ffffffe0000ec54e>] do_exit+0x766/0x784
[   26.896234] [<ffffffe0000ec5be>] do_group_exit+0x22/0x6e
[   26.904826] [<ffffffe0000f4092>] get_signal+0x132/0x5f4
[   26.912932] [<ffffffe0000e690a>] do_notify_resume+0x64/0x334
[   26.921894] [<ffffffe0000e5e84>] ret_from_exception+0x0/0xc
[   26.930542] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 ]---

The b920 opcode corresponds to "fsd fs0,112(a0)" at 0x103c8 in busybox:

0000000000010394 <__sigsetjmp>:
   10394:       00153023                sd      ra,0(a0)
   10398:       e500                    sd      s0,8(a0)
   1039a:       e904                    sd      s1,16(a0)
   1039c:       01253c23                sd      s2,24(a0)
   103a0:       03353023                sd      s3,32(a0)
   103a4:       03453423                sd      s4,40(a0)
   103a8:       03553823                sd      s5,48(a0)
   103ac:       03653c23                sd      s6,56(a0)
   103b0:       05753023                sd      s7,64(a0)
   103b4:       05853423                sd      s8,72(a0)
   103b8:       05953823                sd      s9,80(a0)
   103bc:       05a53c23                sd      s10,88(a0)
   103c0:       07b53023                sd      s11,96(a0)
   103c4:       06253423                sd      sp,104(a0)
   103c8:       b920                    fsd     fs0,112(a0)
   103ca:       bd24                    fsd     fs1,120(a0)
   103cc:       09253027                fsd     fs2,128(a0)
   103d0:       09353427                fsd     fs3,136(a0)
   103d4:       09453827                fsd     fs4,144(a0)
   103d8:       09553c27                fsd     fs5,152(a0)
   103dc:       0b653027                fsd     fs6,160(a0)
   103e0:       0b753427                fsd     fs7,168(a0)
   103e4:       0b853827                fsd     fs8,176(a0)
   103e8:       0b953c27                fsd     fs9,184(a0)
   103ec:       0da53027                fsd     fs10,192(a0)
   103f0:       0db53427                fsd     fs11,200(a0)
   103f4:       1df5f06f                j       6fdd2 <__sigjmp_save>

I'm wondering why sstatus doesn't have FS set, and what I'd have to do to get BBL to tell Linux (running in S mode) that there's floating point support in machine (M) mode ?

Edit: Oh, and if I build my kernel without "CONFIG_FPU", and build busybox with only "-march=rv64imac -mabi=lp64", it boots all the way into a busybox shell (ash), so this is strictly a question of getting BBL to actually advertise FP capabilities in M mode, as far as I can tell.

gsomlo avatar Jul 03 '19 18:07 gsomlo

As it turns out, bbl appears to pass through the unmodified "rv64imac" CPU ISA string to Linux, who then assumes there is no FP available, and kills any process that attempts it without ever punting to M-mode. Lying to Linux by claiming "rv64imafdc" in the DTB gets it working. I'm looking at where BBL should do the "s/rv64imac/rv64imafdc/" edit to the DTB before starting its payload, and why it doesn't do so in my situation. I might have a patch soon if nobody beats me to it :)

gsomlo avatar Jul 03 '19 19:07 gsomlo

@gsomlo I met same error with you,

[   20.880859] init[1]: unhandled signal 4 code 0x1 at 0x0000003fbb501bb0 in ld-2.30.so[3fbb4f1000+17000]
[   20.902984] CPU: 0 PID: 1 Comm: init Not tainted 5.7.0+ #1
[   20.915893] epc: 0000003fbb501bb0 ra : 0000003fbb500cde sp : 0000003fffd8a480
[   20.932342]  gp : ffffffe000a2cf50 tp : 0000000000000000 t0 : 0000000000000000
[   20.948211]  t1 : 0000003fbb4f1e7c t2 : 000000006fffffff s0 : 0000000000000000
[   20.964782]  s1 : 0000003fffd8a620 a0 : 0000003fffd8a4b8 a1 : 0000000000000000
[   20.981353]  a2 : 0000003fffd8a6d8 a3 : 0000003fffd8a6c0 a4 : 0000003fffd8a4a8
[   20.997222]  a5 : 0000003fbb50a0e0 a6 : 7efefefefefefeff a7 : 24160a4b570a5248
[   21.013793]  s2 : 00000000000138c9 s3 : 00000000000bae10 s4 : 0000003fbb50a160
[   21.029724]  s5 : 0000000000000000 s6 : 0000003fbb50a160 s7 : 0000000000000000
[   21.046295]  s8 : 0000000000000000 s9 : 0000003fbb509ff8 s10: 0000003fffd8a6c0
[   21.062927]  s11: 0000003fffd8a6d8 t3 : 0000003fbb500cb0 t4 : 0000000000000004
[   21.078735]  t5 : 0000000000000004 t6 : 0000000000000004
[   21.091796] status: 0000000200000020 badaddr: 000000000000b920 cause: 0000000000000002

Did you know how to view the disassemble code of at 0x0000003fbb501bb0 in ld-2.30.so[3fbb4f1000+17000], I tried to use objdump to disassemble the ld-2.30.so, but I can't get the disassemble code at 0x0000003fbb501bb0 address.

Thanks for any input you can provide.

fanghuaqi avatar Nov 17 '20 08:11 fanghuaqi

On Tue, Nov 17, 2020 at 12:49:27AM -0800, Huaqi Fang wrote:

Did you know how to view the disassemble code of at 0x0000003fbb501bb0 in ld-2.30.so[3fbb4f1000+17000], I tried to use objdump to disassemble the ld-2.30.so, but I can't get the disassemble code at 0x0000003fbb501bb0 address.

Not sure whether this really is the same problem or not -- depends on the faulting instruction you're trying to disassemble. Not sure what your specifics are -- off the top of my head, one way for objdump to not work properly is if you're using the x86 native objdump on cross-compiled code for a different architecture -- make sure you're using the objdump associated with your toolchain for the target CPU arch.

My problem was specific to floating-point opcodes on RV64GC, and the solution is outlined in comment https://github.com/riscv/riscv-pk/issues/166#issuecomment-508222340

HTH, --G

gsomlo avatar Nov 17 '20 14:11 gsomlo

OK, thank you @gsomlo , I will try to use riscv objdump to check the instruction mentioned in the offset.

fanghuaqi avatar Nov 18 '20 01:11 fanghuaqi