gmpy gmpy2: numerical error in simple addition?

bug_report.txt A simple floating-point addition x+y in with precision 4 (i.e. IEEE mantissa width 3), with 3 bits for exponent (emax=3, emin=-4) for x = mpfr('0.75'), y = mpfr('0.03125') incorrectly gives mpfr('0.75') as result when it should be mpfr('0.8125'). Note that 0.3125 is a subnormal number for this reduced precision format. Terminal interaction attached, am I doing something wrong?

Jan 19 '16 15:01 jaideepr

I haven't had sufficient time to understand the problem. It may be a couple of days.

Jan 20 '16 06:01 casevh

The problem is not related to creating a subnormal from a string. In this case, the subnormal value is created properly. In gmpy2 2.0.x, there is a rare bug when converted a string to a subnormal. The simplest work-around is to convert the input to an mpq type first; i.e. mpfr(mpq('0.03125')).

The actual problem is the default rounding mode. The intermediate sum is exactly halfway between two 4 bit values. The default rounding mode of RoundToNearest selects the rounded value with final bit of 0. If you change the rounding mode to RoundUp, you get the expected result.

from gmpy2 import * ctx=context(emax=4, emin=-4, precision=4) set_context(ctx) a=mpfr('0.75') b=mpfr('0.03125') "{0:.10Df}".format(a+b) '0.7500000000' get_context().round=RoundUp "{0:.10Df}".format(a+b) '0.8125000000'

One last comment: the values of precision, emax and emin are slight different between the IEEE standards and the MPFR library. If e is the exponent size and p is the precision (in IEEE terms), then precision should be p+1,emax should be 2**(e-1) and emin should be 4-emax-precision. This doesn't impact your question since it only changes emax.

Jan 24 '16 18:01 casevh

Thanks for looking into this so promptly. I am required to use RoundToNearest, so a different rounding mode is not an option for me. I tried the mentioned fix with mpq but it did n't work: gave 0.75 yet again. I'm using gmpy2-2.0.3. Thanks for pointing out the error in emax: can you mention the general formula to get emin for a general e, p (4 seems to be specific to my example?)?

Jan 27 '16 22:01 jaideepr

The formulas are contained in my answer. For a general e,p:

precision=p+1 emax=2**(e-1) emin=4-emax-precision

Regarding the choice in rouding mode versus the correct answer. I think gmpy2 is returning the correct answer but I could easily be wrong.

ctx=context(precision=4, emax=4, emin=-4, subnormalize=True) set_context(ctx) a=mpfr("0.75") b=mpfr("0.03125") a.digits(2) ('1100', 0, 4) b.digits(2) ('1000', -4, 4)

Let's temporarily change the working precision:

get_context().precision=16 c=a+b c.digits(2) ('1100100000000000', 0, 16)

Then use round2 to round the result to 4 bits of precision:

round2(c,4) mpfr('0.75',4) round2(c,4).digits(2) ('1100', 0, 4)

Since RoundToNearest (equivalent to IEEE roundTiesToEven) requires the last bit of result is 0 when the exact value is halfway between two representable values, we get the value.

If the value for a is changed to end with a 1 bit, then rounding changes and the result is rounded up.

get_context().precision=4 d=a+mpq(1)/16 d.digits(2) ('1101', 0, 4) (d+b).digits(2) ('1110', 0, 4)

Jan 31 '16 19:01 casevh

@casevh I am trying to understand the formula for emin. I think understand but I just want to make sure I am correct.

precision=p+1 makes sense as IEEE754 has an implicit 1 but MPFR does not

emax=2**(e-1) makes sense as IEEE754 emax would be 2**(e-1) - 1 and have mantissa in [1, 2) where MPFR has mantissa in [1/2, 1) so the exponent must be one larger.

emin=4-emax-precision In IEEE754 the minimum subnormal number would be 2**(1-bias) * 2**(-p) where bias is 2**(e - 1) - 1. Hence we need to represent numbers with exponents of: 1-(2**(e-1) - 1) - p now as MPFR is shifted we really want 2-(2**(e-1) - 1) - p == 3-emax-p == 4-emax-precision

May 16 '19 00:05 cdonovick

gmpy gmpy copied to clipboard

gmpy2: numerical error in simple addition?

gmpy
gmpy copied to clipboard