nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

RISC-V:the FPU case in ostest failed due to "lazy" FPU save/restore?

Open hotislandn opened this issue 3 years ago • 12 comments

We tried to port NuttX on a new SoC, and found an issue related to the FPU test.

In the source files: https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/common/riscv_exception_common.S https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/common/riscv_macros.S

We can see that when the FPU context needs to be saved, it checks the FS field, and stores the FPU regs ONLY when it is dirty, then mark it as clean and updates "mstatus" in the full context.

In the FPU test: https://github.com/apache/incubator-nuttx-apps/blob/master/testing/ostest/fpu.c

  1. when up_saveusercontext is called for the first time, the FS=3 after float OPs in this thread, so we got the FPU context stored in sp, and copied to "fpu->save1", when done, FS=2 in this thread
  2. when up_saveusercontext is called for the second time, just after the first call, the "LAZY" FPU save logic will skip FPU context since FS = 2(CLEAN) != 3(DIRTY). But since the FPU context is still in the stack of the current thread, so we still get the right float regs in fpu->save2, and pass the first up_fpucmp
  3. when up_saveusercontext is called for the third time, the FS is still 2 since we did not touch the FPU during sched_unlock and usleep. However, these two did mess up the stack --- the original FPU context is gone. But due to the "LAZY" FPU save logic, it still skip the FPU context since FS=2. So we will get the un-expected data in fpu->save2 this time, and leads to fail in the second up_fpucmp.

Did we missed something in our port? Thanks in advance.

hotislandn avatar Apr 28 '22 08:04 hotislandn

We tried to port NuttX on a new SoC, and found an issue related to the FPU test.

In the source files: https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/common/riscv_exception_common.S https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/src/common/riscv_macros.S

We can see that when the FPU context needs to be saved, it checks the FS field, and stores the FPU regs ONLY when it is dirty, then mark it as clean and updates "mstatus" in the full context.

In the FPU test: https://github.com/apache/incubator-nuttx-apps/blob/master/testing/ostest/fpu.c

  1. when up_saveusercontext is called for the first time, the FS=3 after float OPs in this thread, so we got the FPU context stored in sp, and copied to "fpu->save1", when done, FS=2 in this thread
  2. when up_saveusercontext is called for the second time, just after the first call, the "LAZY" FPU save logic will skip FPU context since FS = 2(CLEAN) != 3(DIRTY). But since the FPU context is still in the stack of the current thread, so we still get the right float regs in fpu->save2, and pass the first up_fpucmp
  3. when up_saveusercontext is called for the third time, the FS is still 2 since we did not touch the FPU during sched_unlock and usleep. However, these two did mess up the stack --- the original FPU context is gone. But due to the "LAZY" FPU save logic, it still skip the FPU context since FS=2. So we will get the un-expected data in fpu->save2 this time, and leads to fail in the second up_fpucmp.

Did we missed something in our port? Thanks in advance.

One method is always save all FPU register in up_saveusercontext, another method is do some read only float operation between each test to overcome the lazy fpu saving optimization. The later approach looks simple and less invasive.

xiaoxiang781216 avatar Apr 28 '22 09:04 xiaoxiang781216

Yes, the 2nd way seems enough for this case. e.g., do a float to int conversion, or store a float element to mem, after checking the code generation and evaluating the impaction to all platforms. Another option is to add a new API say up_savefullcontext?

hotislandn avatar Apr 28 '22 11:04 hotislandn

we will give it a try to move things like "fpu->sp1 = sp1" to the new positon: after usleep().

hotislandn avatar Apr 28 '22 11:04 hotislandn

@hotislandn

I'm not sure if your issue relates to the following issue. https://github.com/apache/incubator-nuttx/pull/6134#issuecomment-1109173828

masayuki2009 avatar Apr 30 '22 21:04 masayuki2009

Since the FPU state is stored in the interrupt stack frame, isn't the issue more generic with lazy context switching?

Ouss4 avatar May 04 '22 13:05 Ouss4

Since the FPU state is stored in the interrupt stack frame, isn't the issue more generic with lazy context switching?

In fact, the context switch during the usleep() call does NOT save the FPU regs in TCB since FS is 2 at that time.

hotislandn avatar May 05 '22 10:05 hotislandn

@hotislandn

I'm not sure if your issue relates to the following issue. #6134 (comment)

The root cause is different. However, the suggestion of a new API to get context should work for this. BTW, on RISC-V, we may meet a similar issue after we introduce the RVV into Nuttx.

hotislandn avatar May 05 '22 10:05 hotislandn

In fact, the context switch during the usleep() call does NOT save the FPU regs in TCB since FS is 2 at that time.

Now we only have a reference in the TCB for the save area in the stack.
When a task resumes, SP will point to its original pre-interrupt location making the whole interrupt stack frame available.

Ouss4 avatar May 05 '22 12:05 Ouss4

In fact, the context switch during the usleep() call does NOT save the FPU regs in TCB since FS is 2 at that time.

Now we only have a reference in the TCB for the save area in the stack. When a task resumes, SP will point to its original pre-interrupt location making the whole interrupt stack frame available.

Yes, that's the normal case, tcb -> xcp -> regs points to the right context.

However, in this FPU test, on RISC-V, the lazy FPU optimization causes FS to change to 2 in up_saveusercontext(). When the real context switch happens in usleep(), FPU regs save will be skipped.

hotislandn avatar May 06 '22 02:05 hotislandn

I was talking about the current layout in general and not particularly about the FPU test. I believe that since we are lazily saving/restoring FPU state, it should not be part of the interrupt stack frame and rather have a different, fixed, area, like in the TCB or at the bottom of the stack along side the TLS region and arguments.

Ouss4 avatar May 06 '22 13:05 Ouss4

I was talking about the current layout in general and not particularly about the FPU test. I believe that since we are lazily saving/restoring FPU state, it should not be part of the interrupt stack frame

FPU state isn't saved to the interrupt stack, but the current thread stack.

and rather have a different, fixed, area, like in the TCB or at the bottom of the stack along side the TLS region and arguments.

It is no real different to save FPU state in TCB, bottom of the stack or the near of the integer register context. But, the current layout could be more:

  1. Simple to resolve the context space in case of signal dispatch
  2. Generate more efficient save/restore code(hardware normally has better support for SP related load/save)
  3. It's hard to access TCB field or the bottom of the stack in assembly code without the help from some c utility function

xiaoxiang781216 avatar May 06 '22 15:05 xiaoxiang781216

I was talking about the current layout in general and not particularly about the FPU test. I believe that since we are lazily saving/restoring FPU state, it should not be part of the interrupt stack frame

FPU state isn't saved to the interrupt stack, but the current thread stack.

Yes. I didn't mean interrupt stack, I meant interrupt saving area (or registers saving area) in the current thread's stack.

Ouss4 avatar May 06 '22 15:05 Ouss4