apfloat icon indicating copy to clipboard operation
apfloat copied to clipboard

Can emulate the x87 instruction pow operation?

Open halibobo1205 opened this issue 10 months ago • 12 comments

    // x86_64 JDK8 = 1.0000030588238054
    System.out.println(Math.pow(1.0061363892207218, 0.0005)); 
    // x86_64 JDK8 = 1.0094329734828904
    System.out.println(Math.pow(1.0000046943914231, 2000));
    //  1.0000030588238051
    System.out.println(StrictMath.pow(1.0061363892207218, 0.0005));
   // 1.0094329734828902
    System.out.println(StrictMath.pow(1.0000046943914231, 2000));
   //  1.0000030588238051
    System.out.println(ApfloatMath.pow(new Apfloat(1.0061363892207218, 80), new Apfloat(0.0005, 80)).doubleValue());
    // 1.0094329734828902
    System.out.println(ApfloatMath.pow(new Apfloat(1.0000046943914231, 80), new Apfloat(2000, 80)).doubleValue());

ApfloatMath.pow is consistent with StrictMath.pow but not with Math.pow, is there any way to make ApfloatMath.pow consistent with Math.pow?

halibobo1205 avatar Feb 06 '25 04:02 halibobo1205

First of all, the double constructor has this caveat: http://www.apfloat.org/apfloat_java/tutorial.html#caveats

To avoid it, the code should probably use the Aprational constructor and then get an approximation to e.g. 80 digits. For example:

System.out.println(ApfloatMath.pow(new Aprational(1.0061363892207218).precision(80), new Aprational(0.0005).precision(80)).doubleValue());
1.0000030588238051

System.out.println(ApfloatMath.pow(new Aprational(1.0000046943914231).precision(80), new Aprational(2000).precision(80)).doubleValue());
1.0094329734828902

But it still gives the same result.

Just out of curiosity, to calculate the correct value, the exact values of these floating-point numbers are:

1.0061363892207217940466534855659119784832000732421875
0.0005000000000000000104083408558608425664715468883514404296875
1.0000046943914231434291650657542049884796142578125

(2000 is 2000, the only one of these that is actually the value it appears to be in decimal.)

If we check e.g. what Wolfram Alpha gives:

https://www.wolframalpha.com/input?i=1.0061363892207217940466534855659119784832000732421875%5E0.0005000000000000000104083408558608425664715468883514404296875 1.0000030588238052464490290006211942542706256820307636249...

https://www.wolframalpha.com/input?i=1.0000046943914231434291650657542049884796142578125%5E2000 1.00943297348289029462369268058198837821369992650778055771630627825428689610...

But rounded to nearest double, these seem to be: 1.0000030588238051 1.0094329734828904

So I'm a bit unsure what the correct answer should even be. Is a difference of one ulp allowed.

Apfloat doesn't have a rounding mode when performing calculations (other than the explicit rounding functions), so in general, apfloat isn't consistent with anything. If a particular result happens to be exactly the same you get with either Math.pow() or StrictMath.pow(), that's purely by coincidence. Apfloat in general only gives you the result so that the last decimal digit may be incorrect. See: http://www.apfloat.org/apfloat_java/docs/api-index.html

Of course when you perform calculations with 80 decimal digits of precision in apfloat, then you should get enough precision for a double in the result. However when starting from a double, you must use the Aprational constructor, to get the value exactly correct.

Can you elaborate a bit what your use case is? Why do you want to get results as a double but rounded correctly towards the closest ulp, not just either ulp?

Short answer to your question is "no", though.

mtommila avatar Feb 06 '25 22:02 mtommila

@mtommila, In JDK8 for x86_64, the fast_pow(Math.pow ) used the following instructions fyl2x and f2xm1, I would like to know if it is possible to emulate them on the ARM platform. We have some historical data calculations previously running under x86_64 JDK8, but now we want to migrate to a post-ARM platform.

halibobo1205 avatar Feb 07 '25 06:02 halibobo1205

But do you want to emulate the results bit-by-bit exactly? Or do you want to e.g. just have any algorithm to calculate pow() to an accuracy of ± one ulp?

mtommila avatar Feb 07 '25 22:02 mtommila

Emulate the results bit-by-bit exactly for pow between the arm and x86, do you have any good ideas?

halibobo1205 avatar Feb 08 '25 06:02 halibobo1205

I see. Well I'm quite sure that the algorithm for computing logarithms is completely different in the apfloat library and in the x87 processors, so I don't think the apfloat library is of any use in emulating the x87 results bit-by-bit.

Based on some googling, this document (R. Nave, "Implementation of Transcendental Functions on a Numerics Processor", Microprocessing and Microprogramming, Vol. 11, No. 3-4, March-April 1983, pp. 221-225) apparently describes how the x87 processor computes those functions (or, at least, early versions of the processor)

https://ieeemilestones.ethw.org/w/images/8/8b/Nave_implenent_transcendental_algos_1983.pdf

The apfloat library on the other hand doesn't use the same algorithms, so results wouldn't be the same bit-by-bit. (Apfloat uses the arithmetic-geometric-mean based algorithm for the logarithm, and Newton's iteration to compute its inverse, for the exponential.)

(Note that I don't know if all later x87 processors use the same CORDIC algorithms as the 8087 from the year 1980 and if they all give the same results bit-by-bit. If a hardware supports floating point multiplication, addition and subtraction then perhaps e.g. a rational polynomial approximation is faster than CORDIC, I don't know.)

Since old computers didn't necessarily have a hardware floating-point processor (such as the 8087, 80287 or 80387) they had to emulate the x87 instructions in software, using integer arithmetic only. I wonder if the source code of any such emulation library is available somewhere? That would perhaps be my best bet, to emulate the x87 results bit-by-bit.

Also if there was the source code available for an x87 software emulator (such as the EM87 emulator by Ron Kimball, if you google it) then that could be helpful, but I couldn't easily find any.

mtommila avatar Feb 08 '25 11:02 mtommila

The GNU compiler, GCC is open source and should also have a 387 software emulation library, but I don't know if it produces bit-by-bit the same results as an Intel provided library.

mtommila avatar Feb 08 '25 11:02 mtommila

E.g. QEMU has an emulation of fyl2x and f2xm1 but there's no knowing if it's bit-by-bit identical with a physical x87 processor. Also the source is in C:

https://github.com/qemu/qemu/blob/master/target/i386/tcg/fpu_helper.c

I found this pure java implementation of the java.lang.Math class but it seems to produce results that are consistent with StrictMath.pow and not with Math.pow (at least for the above numbers) and also it obviously would not really have anything to do with x87:

https://www.dclausen.net/projects/microfloat/

mtommila avatar Feb 08 '25 16:02 mtommila

Hi,

The algorithm I wrote produced the results shown below. The aim is to calculate a result accurate to a user defined order of magnitude. In this case I tried producing results to 100 then 20 decimal places. The results only seem to agree with those from WolframAlpha to 19 and 13 decimal places respectively. I've not looked into this discrepancy yet, and although this may be a distraction, it might be helpful, so I thought to chime in.

Best wishes,

Andy

1.0000030588238052464519238536471846532785721466743258859944119238248609030448432949291002123328671814 1.00000305882380524645

1.009432973482802617372827611793722757595124623951244427585000127306614841616195637206003619771723216 1.00943297348280261737

A test class that produces these results: https://github.com/agdturner/ccg-math/blob/master/src/test/java/uk/ac/leeds/ccg/math/arithmetic/test/Math_BigDecimalTest.java

agdturner avatar Feb 09 '25 15:02 agdturner

If you have access to an ARM computer, then can you run QEMU on the ARM machine to emulate x86_64 and then run the x86_64 JDK8 on QEMU to emulate the behavior? Would this meet your requirements?

Presumably you would just need qemu-system-x86_64 that is built to run on aarch64, and then the x86_64 JDK8 running on top of it.

(I'm assuming you asked the same question on Stackoverflow https://stackoverflow.com/questions/79384612/emulate-x87-fpu-pow-operations-based-on-arm which clarifies the situation a bit.)

mtommila avatar Feb 09 '25 15:02 mtommila

QEMU might be an option, but I don't want to run the whole program in an emulator.

halibobo1205 avatar Feb 10 '25 06:02 halibobo1205

I have a feeling that first of all, the physical x87 processors don't guarantee always the exact mathematically correct result for all transcendental and trigonometric functions and every possible input, to the ulp. Results for some functions and some input values might differ from the mathematically most correct rounded result by one ulp.

Also, I have a feeling that x87 processors of different generations do not guarantee that they will give exactly the same result, bit-by-bit, for every possible input and every function, as all other x87 processors of different generations. So some results might differ by one ulp if you just use different x87 hardware.

So, emulating the exact bit-by-bit results produced by x87 would not only require x87 hardware, but would be specific to the exact x87 processor model, at least in some cases.

mtommila avatar Feb 11 '25 20:02 mtommila

I noticed another difference in JDK8. Intel(R) Xeon(R) Platinum 8124M(Linux) VS intel i7(Macos)

System.out.println(Math.pow(0.95, 359)); // 3e459cca0dcd9844(Macos), 3e459cca0dcd9845(Linux)

halibobo1205 avatar Feb 12 '25 06:02 halibobo1205

I think this is (almost) impossible to solve. X87 does not have a "pow" instruction, it only has FYL2X and F2XM1, which can be used to calculate pow. And x87's implementation is known to produce quite inaccurate result when the argument is outside of their comfortable zone. Java is famous to do much stricter "range reduction" to the args to reduce the error (compared to typical C libm, Java is slower for some reason).

Since there's no "pow" instruction at all, and Intel (and different vendors like AMD or VIA) makes no promise to always return same results (for different genenerations of CPU), there is no "bit for bit identical results " at all.

See https://community.intel.com/t5/Processors/FPU-Not-Always-Rounding-Inexact-Results-Correctly-For-Double/td-p/1495070 for an example where Intel and AMD disagree with some args.

In fact different versions of Java makes no promise to return bits-by-bits same results for these functions. https://stackoverflow.com/questions/64586096/math-pow-in-java-8-and-java-13-returns-different-result

(And that's the reason there is StrictMath at all. StrictMath is about consistency)

ntysdd avatar May 09 '25 15:05 ntysdd

@halibobo1205 That is to say, if you want consistency, use StrictMath.pow().

Math.pow() doesn't always return the same results, even on x86.

Quote https://stackoverflow.com/questions/4232231/whats-the-difference-between-java-lang-math-and-java-lang-strictmath

The Javadoc for the Math class provides some information on the differences between the two classes:

Unlike some of the numeric methods of class StrictMath, all implementations of the equivalent functions of class Math are not defined to return the bit-for-bit same results. This relaxation permits better-performing implementations where strict reproducibility is not required.

By default many of the Math methods simply call the equivalent method in StrictMath for their implementation. Code generators are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of Math methods. Such higher-performance implementations still must conform to the specification for Math.

ntysdd avatar May 09 '25 15:05 ntysdd

Closing this issue as "no".

mtommila avatar May 12 '25 19:05 mtommila