python-flint
python-flint copied to clipboard
More efficient conversion of fmpz to float
I propose to skip the costly integer conversion to use FLINT existing conversion functions.
Currently for large inputs it raises OverflowError (rather than returning infinity), this behaviour is preserved by using math.ldexp
After the patch there is a slight behaviour change: conversion of 2**1024-1 now succeeds and returns sys.float_info.max instead of raising OverflowError. Previously the largest convertible integer was 2**1024-2**(1024-54)-1
Update: if the number is known to be small, using fmpz_get_d directly is even faster (even faster than conversion of Python int)
Performance
In [5]: x = 2**1024-2**(1024-53)
In [6]: n = flint.fmpz(x)
In [7]: %timeit float(x)
39.6 ns ± 0.0977 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
# python-flint 0.8.0
In [8]: %timeit float(n)
557 ns ± 2.71 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# this PR
In [22]: %timeit float(n)
81.3 ns ± 0.563 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [12]: x = random.getrandbits(1020)
In [13]: n = flint.fmpz(x)
In [14]: %timeit float(x)
31.4 ns ± 0.137 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
# This PR
In [15]: %timeit float(n)
25.7 ns ± 0.0268 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
This changes the returned value:
$ cat float.py
import random
from flint import fmpz
for _ in range(100):
x = random.getrandbits(1020)
assert float(fmpz(x)) == float(x), x
$ spin run python float.py
$ meson compile -C build
$ meson install --only-changed -C build --destdir ../build-install
Traceback (most recent call last):
File "/stuff/current/active/python-flint/float.py", line 6, in <module>
assert float(fmpz(x)) == float(x), x
^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 7032762630615997075984776923333935086304805039016677690569260912478663353622423395407083820822744199783856237891740932258878466924940393236854755908411876363285950823905383993344868600629929050010379573658259044905557178310577704148548523391522191512473418439519460923094265091876624476180133575388198464166
I think this is because fmpz_get_d rounds downwards whereas float(int) rounds to nearest.
I don't see that clearly documented though. Here it only says:
Otherwise, if the argument is an integer or a floating-point number, a floating-point number with the same value (within Python’s floating-point precision) is returned. If the argument is outside the range of a Python float, an OverflowError will be raised.
The CPython code for it is here with a comment saying that it is round-half-even:
/* Get a C double from an int object. Rounds to the nearest double,
using the round-half-to-even rule in the case of a tie. */
double
PyLong_AsDouble(PyObject *v)
{
int64_t exponent;
double x;
if (v == NULL) {
PyErr_BadInternalCall();
return -1.0;
}
if (!PyLong_Check(v)) {
PyErr_SetString(PyExc_TypeError, "an integer is required");
return -1.0;
}
if (_PyLong_IsCompact((PyLongObject *)v)) {
/* Fast path; single digit long (31 bits) will cast safely
to double. This improves performance of FP/long operations
by 20%.
*/
return (double)medium_value((PyLongObject *)v);
}
x = _PyLong_Frexp((PyLongObject *)v, &exponent);
assert(exponent >= 0);
assert(!PyErr_Occurred());
if (exponent > DBL_MAX_EXP) {
PyErr_SetString(PyExc_OverflowError,
"int too large to convert to float");
return -1.0;
}
return ldexp(x, (int)exponent);
}
The function _PyLong_Frexp above looks quite complicated. A simple option could be to figure out for what range of values fmpz_get_d gives the correct result and delegate to float(int(x)) otherwise.
Oh, this is sad. Round to nearest with half-even convention is definitely a must, I will check what can be done.
It looks like gmpy2 does not use round-half-even:
$ cat float.py
import random
from gmpy2 import mpz
for _ in range(100):
x = random.getrandbits(54)
assert float(mpz(x)) == float(x), x
$ python float.py
Traceback (most recent call last):
File "/stuff/current/active/python-flint/float.py", line 6, in <module>
assert float(mpz(x)) == float(x), x
^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 10741429198758067
Commit 9382a42dc47320e7642f9965b46b9f75059c1b16 implements correct rounding (round to nearest + round-half-to-even)
It is enough to test an interval of consecutive numbers to check the convention
for x in range(2**64, 2**64 + 1000000):
assert float(x) == float(flint.fmpz(x))
assert float(-x) == float(flint.fmpz(-x))
There was a test failure under PyPy:
File "/Users/runner/hostedtoolcache/PyPy/3.11.13/x64/lib/pypy3.11/site-packages/flint/types/fq_default.pypy311-pp73-darwin.so", line 270, in flint.types.fq_default.fq_default_ctx.gen
Failed example:
gf.gen()
Expected:
w
Got:
<BLANKLINE>
The change here does not look like it could cause that so maybe there is something undefined here.
It is enough to test an interval of consecutive numbers to check the convention
for x in range(2**64, 2**64 + 1000000): assert float(x) == float(flint.fmpz(x)) assert float(-x) == float(flint.fmpz(-x))
Could you add this to the tests (with fewer iterations)?
It would be good to ensure that the tests cover every possible branch of the __float__ method.
Timing with main I get:
In [2]: x = 2**64 + 1
In [3]: n = flint.fmpz(x)
In [4]: %timeit float(x)
88.4 ns ± 0.241 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [5]: %timeit float(n)
938 ns ± 12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [6]: x = 123
In [7]: n = flint.fmpz(x)
In [8]: %timeit float(x)
55.9 ns ± 0.38 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [9]: %timeit float(n)
66.5 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
So for larger integers converting to int is definitely slow. However for smaller integers it is fast and we benefit from CPython's fast int -> float conversion. Repeating the same with the PR I get:
In [6]: x = 2**64 + 1
In [7]: n = flint.fmpz(x)
In [8]: %timeit float(x)
88.7 ns ± 0.505 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [9]: %timeit float(n)
94.5 ns ± 0.383 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [10]: x = 123
In [11]: n = flint.fmpz(x)
In [12]: %timeit float(x)
55.8 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [13]: %timeit float(n)
71.8 ns ± 0.527 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
So for larger integers this is a lot faster i.e. 10x for an integer slightly larger than 2**64. I imagine that most of the cost is due to the fmpz to int conversion itself being slow (I think this uses the hex-string path).
For the smaller integer though it is seems possibly slower on this measurement although the difference seen is close to margin of error. I wonder if it is somehow possible to have an even faster path using COEFF_IS_MPZ and then just directly casting from slong to double. It looks like that is what fmpz_get_d already does but maybe the other things like fmpz_val2 etc are adding overhead in that case.
Also this should have a release note.
In the case of small integers I believe the largest overhead comes from the Python int allocated in fmpz_get_intlong. Casting directly to slong and double avoids this extra Python object.
I refactored slightly, also to avoid the performance hit for negative numbers.
Python int (baseline)
In [3]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
...: x = e
...: %timeit float(x)
...: x = -e
...: %timeit float(x)
29.6 ns ± 0.209 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29.5 ns ± 0.259 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.9 ns ± 0.539 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
32.2 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.7 ns ± 0.332 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.9 ns ± 0.414 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.6 ns ± 0.186 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
32.2 ns ± 0.299 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
python-flint 0.8.0
In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
...: x = flint.fmpz(e)
...: %timeit float(x)
...: x = flint.fmpz(-e)
...: %timeit float(x)
42.8 ns ± 0.596 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
42.9 ns ± 0.44 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
202 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
220 ns ± 3.72 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
172 ns ± 0.348 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
188 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
175 ns ± 0.843 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
190 ns ± 0.878 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
commit 91dc079c69ecccb2ad50e35c236e0e8da7abd218
In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
...: x = flint.fmpz(e)
...: %timeit float(x)
...: x = flint.fmpz(-e)
...: %timeit float(x)
...: assert float(x) == float(-e)
23.6 ns ± 0.264 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
25.7 ns ± 0.254 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.7 ns ± 0.617 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
48.3 ns ± 0.477 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.8 ns ± 0.215 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
47.1 ns ± 0.441 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.9 ns ± 0.357 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
48.9 ns ± 0.165 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
commit 6a5370c99b8d905d1522b163c584978c35f4fb76 (new)
In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
...: x = flint.fmpz(e)
...: %timeit float(x)
...: x = flint.fmpz(-e)
...: %timeit float(x)
...: assert float(x) == float(-e)
21.6 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
21.2 ns ± 0.338 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28 ns ± 0.3 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29.1 ns ± 0.591 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
27.9 ns ± 0.403 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29 ns ± 0.288 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28 ns ± 0.324 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.6 ns ± 0.435 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Performance in latest version of the PR is more consistently faster than Python ints.
Performance in latest version of the PR is more consistently faster than Python ints.
Very nice.