cvfpu icon indicating copy to clipboard operation
cvfpu copied to clipboard

Bug report: Under some microarchitectural circumstances, NAN conversion from simple to double precision gives wrong result

Open flaviens opened this issue 1 year ago • 0 comments

Hi there!

I've detected a bug in CVA6, probably in CVFPU but I'm not certain. I initially posted the issue in the CVA6 repo before moving it here. I used the commit 17743bc7120f1eb24974e5d7eb7f519ef53c4bdc of CVA6.

Brief bug description

A conversion of nan from simple to double may set a lot of bits in the mantissa, unexpectedly, under specific microarchitectural conditions.

Example instance 1

In this instance, we convert the single-precision 0xffffffffff800000 using fcvt.d.s. We expect to get 0xfff0000000000000 as a result, but under the circumstances induced in the very simple ELF, we get 0xffefffffffffffff. I confirmed the expected result with spike, and in the snippet further below in CVA6.

Here is the ELF and waveforms of the bug. A symptom of the bug is the first write to address 0x18, which essentially writes the result of the conversion.

Here you can observe that the value dumped is erroneous.

waveform

Example instance 2

In this instance, we convert the single-precision 0xffffffff7f800000 using fcvt.d.s. We expect to get 0x7ff0000000000000 as a result, but under the circumstances induced in the very simple ELF, we get 0x7fefffffffffffff. I confirmed the expected result with spike, and in the snippet further below in CVA6.

Here is the ELF and waveforms of the bug. A symptom of the bug is the sixth write to address 0x18, which essentially writes the result of the conversion.

waveform

Example instance that runs correctly

Interestingly, performing the conversion out of context works, for example this snippet executes apparently correctly. It is similar if we replace the values with the values of the second instance.

  .section ".text.init","ax",@progbits
  .globl _start
  .align 2
_start:

  # Enable the FPU
  li t0, 0x2000
  csrs mstatus, t0
  csrw	fcsr,x0

  la t0, .fdata0
  fld fa5, (t0)

  fcvt.d.s ft0, fa5

  li t0, 0x18
  fsd ft0, (t0)

  sw x0, 0(x0)

infinite_loop:
  j infinite_loop

.section ".fdata0","ax",@progbits
  .8byte 0xffffffffff800000

Therefore I deduce that this is conditioned by some microarchitectural effects (maybe branch prediction but I am not certain).

Thanks! Flavien

flaviens avatar Mar 06 '23 14:03 flaviens