riscv-vector-tests icon indicating copy to clipboard operation
riscv-vector-tests copied to clipboard

Potential race conditions in both C testfloat3 and Go rand number generators

Open nadime15 opened this issue 5 months ago • 5 comments

I am seeing inconsistent .quad values between compilations, even though I use fixed seeds. The issue are race conditions caused by multiple goroutines accessing the shared number generator in different orders and resets in between.

What I think is happening: Multiple goroutines call (indirectly via main) testfloat3.InitF16() , and that function calls srand(2024). At the same time, Go’s genRandomData() calls rand.Seed(n).

Due to the parallelization of the loop with goroutines, these generators are reset and consumed simultaneously. The seeds are deterministic, but the access pattern is not, which leads to non deterministic sequences.

For context: I have customized the script to only test the "vd, vs2, vm" format, and I have added a few new instructions that do not have default test case values. So the source register data is completely random

nadime15 avatar Jul 30 '25 20:07 nadime15

Interesting, but I guess we can add some locks to resolve this.

ksco avatar Aug 06 '25 06:08 ksco

I am not very familiar with Go, but I do not think locking the goroutines is enough, since they can still start in an unpredictable order (?) which is the root cause.

What might work is giving each instruction its own independent random number generator (switching from rand to rand_r() or other alternatives), or else switching entirely to sequential execution and dropping goroutines. That would probably hurt performance, though maybe we could offer parallel execution as an optional feature.

nadime15 avatar Aug 06 '25 14:08 nadime15

I’ll come up with something tomorrow!

ksco avatar Aug 06 '25 15:08 ksco

Hi @ksco is your commit https://github.com/chipsalliance/riscv-vector-tests/commit/b39a70e8c86e45d65ce6ad96fa8f71a11157932f related to this issue?

nadime15 avatar Aug 18 '25 19:08 nadime15

Yes, but it's still not enough for Testfloat cases to be stable.

ksco avatar Aug 19 '25 06:08 ksco