Suggestion - Fast floats
Currently floating point numbers are handled by Berkeley Soft floats, as stated in the readme these are precise but slow.
Could there be an optional setting in libv86 for faster floats using 64bit js or wasm floats in place of 80 bit x87 floats (with the main downside being precision)? A hack like what box86 does for floats may help performance.
Yes, that would be reasonable. That said:
- Back in the day v86 used JS floats. The switch to softfloats happened after I found that some code breaks without 80-bit floats (iirc, the
printfimplementation of haiku's libc). So this would need to stay opt-in - I suspect modern compilers generate SSE instructions (which use 64-bit floats). Those aren't optimised in v86 yet (the jit generates a call for each instruction, except for some memory moves and integer arithmetic)
- For softfloats the jit generates calls into the berkeley library. Afaik some operations have fast paths that the jit could inline into the generated code to recover some performance
I decided to see what compilers do with the following code
float addTen(float num) {
return 10 + num;
}
on https://godbolt.org/
for 64bit targets (compiled with -O3) gcc seems to use SSE adding instructions (specifically addss) while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.
On the contrary, clang uses addss for both 32 and 64bit targets
not sure if this information is helpful but I hope it provides some insight
while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.
Can you try to add -mfpmath=sse to compiler parameters (see https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html)? With this option, gcc uses movss and addss instead fadd: https://godbolt.org/z/3M6YW7cYK
while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.
Can you try to add
-mfpmath=sseto compiler parameters (see https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html)? With this option, gcc uses movss and addss instead fadd: https://godbolt.org/z/3M6YW7cYK
yeah uses sse for floating point addition for me now. is sse math currently slower than x87 math in v86 or has this not been benchmarked
also compiling clang with -march=i686 yields normal x87 instructions. I believe most 32bit distros target i686 so it should mean sse shouldn't be present in distro packages since i686/pentiumpro didn't have sse either way
my rather unscientific test ran on my laptop (Intel Core i7-1165g7) with a compiled version of https://www.netlib.org/benchmark/linpackc.new shows that SSE seems to be faster inside of v86
I believe most 32bit distros target i686 so it should mean sse shouldn't be present in distro packages since i686/pentiumpro didn't have sse either way
In ArchLinux32 only i486 doesn't use SSE: https://archlinux32.org/architecture/.
Additionally, some Linux distros and software are classificating an i(3 or 6)86 as all 32-bit CPUs up to P4 without dividing on supported instructions such as CMOV and SSE1-3: https://gitlab.alpinelinux.org/alpine/tsc/-/issues/20, https://lists.debian.org/debian-devel/2015/09/msg00595.html