symengine.py slower than sympy for creating integers and rationals
I did some tests in python to assess the difference of speed for creating integers and rationals in symengine in comparison to sympy. This is what I run:
In [1]: import symengine
In [2]: import sympy
In [3]: %timeit symengine.Integer(2)
265 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [4]: %timeit sympy.Integer(2)
236 ns ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [6]: %timeit sympy.Rational(2, 3)
246 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [7]: %timeit symengine.Rational(2, 3)
1.61 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
So creation of a small integer is slightly faster with sympy and creation of a small rational is one order of magnitude faster with sympy. Note that the symengine wrappers for python add some overhead, but this much? What could be the reason for this? (I also tested some more complex operations and indeed symengine is faster overall)
SymPy caches the results of those methods. Without the cache, here's what the timings look like,
In [1]: import random, sympy, symengine
In [2]: %timeit symengine.Integer(random.randint(1, 1e6))
1.95 µs ± 265 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [3]: %timeit sympy.Integer(random.randint(1, 1e6))
2.92 µs ± 61 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [4]: %timeit symengine.Rational(random.randint(1, 1e6), random.randint(1, 1e6))
6.33 µs ± 301 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [5]: %timeit sympy.Rational(random.randint(1, 1e6), random.randint(1, 1e6))
9.56 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
With setup code removed from timings there's still a slowdown
In [1]: import random, sympy, symengine
In [2]: %%timeit a=random.randint(1, 1e12)
...: symengine.Integer(a)
...:
...:
1.51 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [3]: %%timeit a=random.randint(1, 1e12)
...: sympy.Integer(a)
...:
...:
430 ns ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %%timeit a=random.randint(1, 1e12);b=random.randint(1, 1e12)
...: symengine.Rational(a, b)
...:
...:
4.67 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [5]: %%timeit a=random.randint(1, 1e12);b=random.randint(1, 1e12)
...: sympy.Rational(a, b)
...:
...:
487 ns ± 37.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
One bottleneck is that we convert Python integers to string and pass them to the C++ library. This can be fixed though.
Integers that fits in an int will not be converted to strings so I tested with smaller numbers. For Integers we get close, but Rationals are much slower to create with symengine.
In [1]: %%timeit a=random.randint(1, 1e6)
...: symengine.Integer(a)
...:
286 ns ± 7.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [2]: %%timeit a=random.randint(1, 1e6)
...: sympy.Integer(a)
...:
245 ns ± 2.49 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [3]: %%timeit a=random.randint(1, 1e6); b=random.randint(1,1e6)
...: symengine.Rational(a, b)
...:
1.57 µs ± 34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [4]: %%timeit a=random.randint(1, 1e6); b=random.randint(1,1e6)
...: sympy.Rational(a, b)
...:
242 ns ± 1.47 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is the constructor for rationals:
class Rational(Number):
def __new__(cls, p, q):
return Integer(p)/q
Could it be that the division here is actually a div in symengine? That would be quite a bit slower than simply constructing a Rational. Also going via Integer seems unnecessary.
This is the constructor for integers:
def __new__(cls, i):
i = int(i)
cdef int i_
cdef symengine.integer_class i__
cdef string tmp
try:
# Try to convert "i" to int
i_ = i
int_ok = True
except OverflowError:
# Too big, need to use mpz
int_ok = False
tmp = str(i).encode("utf-8")
i__ = symengine.integer_class(tmp)
# Note: all other exceptions are left intact
if int_ok:
return c2py(<rcp_const_basic>symengine.integer(i_))
else:
return c2py(<rcp_const_basic>symengine.integer(i__))
Perhaps the try-except could be avoided by checking the bounds explicitly instead. Any ideas for how we could avoid the string encoding?
Apart from #391 I don't think there are any things that can be done in cython to speed this up. symengine is at a disadvantage here since we need to cross the language boundary with the python integer whereas sympy can use it directly. Even if going via a string for large numbers comes with a penalty I don't see any options since there are no other constructors for mpz.
Perhaps a specialized pure c wrapper for these constructors could be made faster, but it might not be worth it.
Turns out there are other constructors for mpz from an array in memory. This could be used to speed up the integer constructor by constructing directly from the python integer to mpz.