v86 Suggestion - Fast floats

Currently floating point numbers are handled by Berkeley Soft floats, as stated in the readme these are precise but slow.

Could there be an optional setting in libv86 for faster floats using 64bit js or wasm floats in place of 80 bit x87 floats (with the main downside being precision)? A hack like what box86 does for floats may help performance.

Apr 15 '24 03:04 ProgrammerIn-wonderland

Yes, that would be reasonable. That said:

Back in the day v86 used JS floats. The switch to softfloats happened after I found that some code breaks without 80-bit floats (iirc, the printf implementation of haiku's libc). So this would need to stay opt-in
I suspect modern compilers generate SSE instructions (which use 64-bit floats). Those aren't optimised in v86 yet (the jit generates a call for each instruction, except for some memory moves and integer arithmetic)
For softfloats the jit generates calls into the berkeley library. Afaik some operations have fast paths that the jit could inline into the generated code to recover some performance

Apr 15 '24 11:04 copy

I decided to see what compilers do with the following code

float addTen(float num) {
    return 10 + num;
}

on https://godbolt.org/

for 64bit targets (compiled with -O3) gcc seems to use SSE adding instructions (specifically addss) while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.

On the contrary, clang uses addss for both 32 and 64bit targets

not sure if this information is helpful but I hope it provides some insight

Apr 15 '24 19:04 ProgrammerIn-wonderland

while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.

Can you try to add -mfpmath=sse to compiler parameters (see https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html)? With this option, gcc uses movss and addss instead fadd: https://godbolt.org/z/3M6YW7cYK

Apr 15 '24 20:04 SuperMaxusa

while for 32bit targets (compiled with -m32 and -O3) gcc seems to just use x87 instructions like fadd.

Can you try to add -mfpmath=sse to compiler parameters (see https://gcc.gnu.org/onlinedocs/gcc-6.3.0/gcc/x86-Options.html)? With this option, gcc uses movss and addss instead fadd: https://godbolt.org/z/3M6YW7cYK

yeah uses sse for floating point addition for me now. is sse math currently slower than x87 math in v86 or has this not been benchmarked

also compiling clang with -march=i686 yields normal x87 instructions. I believe most 32bit distros target i686 so it should mean sse shouldn't be present in distro packages since i686/pentiumpro didn't have sse either way

Apr 17 '24 02:04 ProgrammerIn-wonderland

Screenshot_20240416_232217 my rather unscientific test ran on my laptop (Intel Core i7-1165g7) with a compiled version of https://www.netlib.org/benchmark/linpackc.new shows that SSE seems to be faster inside of v86

Apr 17 '24 03:04 ProgrammerIn-wonderland

I believe most 32bit distros target i686 so it should mean sse shouldn't be present in distro packages since i686/pentiumpro didn't have sse either way

In ArchLinux32 only i486 doesn't use SSE: https://archlinux32.org/architecture/. Additionally, some Linux distros and software are classificating an i(3 or 6)86 as all 32-bit CPUs up to P4 without dividing on supported instructions such as CMOV and SSE1-3: https://gitlab.alpinelinux.org/alpine/tsc/-/issues/20, https://lists.debian.org/debian-devel/2015/09/msg00595.html

Apr 17 '24 08:04 SuperMaxusa