gmpy icon indicating copy to clipboard operation
gmpy copied to clipboard

Use low-level mpn_* routines to implement integer arithmetics?

Open skirpichev opened this issue 1 year ago • 3 comments

POC pr for CPython: https://github.com/python/cpython/issues/66121

I think we can adopt similar strategy to implement arithmetic operations for mpz/xmpz's. No need to change MPZ_Object structure. All we need: appropriately allocate/reallocate/free memory by hand and use low-level mpn_* functions to operate on arrays of limbs. I don't expect any speed loss.

On pros: arithmetic on mpz's will never crash the interpreter. (That might be the case for some special function, but not for the basic stuff, working on CPython int's.) New mpz's could be used as a drop-in replacement for int's.

skirpichev avatar Nov 18 '24 06:11 skirpichev

I like this idea. I've thought about it but don't have the time to implement it. I have a system with lots of memory that I can use for stress testing.

casevh avatar Nov 20 '24 04:11 casevh

I'm thinking about a separate package (something like gmp-python, as flint-python; better name?), which will provide such mpz type and few integer-related functions (like isqrt). This type will have int-compatible interface. (Revival of https://github.com/python/cpython/issues/66121-like patch is also in my short-term plans.)

Later we can include this to gmpy2's mpz. I don't think that testing memory handling will require a machine with a lots of memory (e.g. in Linux you can emulate more tight memory bounds with ulimit). By I would appreciate more testing anyway!

A separate package does make sense for me also because other projects (e.g. mpmath, sympy or diofant) use only integer-related stuff from the gmpy2 (well, maybe mpq's too). Now python ecosystem has much better handling of dependencies. So, eventually it might be useful to split the gmpy2 package to two or three. This is a different issue, however. Let me know if it does make sense for you.

skirpichev avatar Nov 20 '24 05:11 skirpichev

FYI, @casevh, early draft published on https://github.com/diofant/python-gmp

It has a large TODO, but basically this approach does work:

$ ulimit -v $((1024*256))
$ python
Python 3.13.0rc1+ (heads/3.13:bd29ce8509, Aug 29 2024, 15:30:15) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gmpy2 import mpz
>>> mpz(-1) ^ mpz(1)
KeyboardInterrupt
>>> mpz(2222222222222211111111122222222222222)**mpz(33322222)
GNU MP: Cannot allocate memory (size=503998640)
Aborted
$ python
Python 3.13.0rc1+ (heads/3.13:bd29ce8509, Aug 29 2024, 15:30:15) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gmp import mpz
>>> mpz(2222222222222211111111122222222222222)**mpz(33322222)
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    mpz(2222222222222211111111122222222222222)**mpz(33322222)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^~~~~~~~~~~~~~
MemoryError

I run primitive benchmarks. It seems this module is slightly faster than the gmpy2 without cache (e.g. I got ~20% boost for multiplication of two 2-digit integers). Probably, it's a temporary speedup and performance will match gmpy2's when project matures (it lacks support for floats, for example).

Edit: It was less easy than I expected. Just using mpn_*() functions isn't enough, some functions still do memory allocation for temporary buffers (e.g. inside of mpn_mul()). (See TMP_ALLOC/TMP_BALLOC macros in the gmp-impl.h.) But I think that using setjmp/longjmp should be safe to free such memory: warning from the chapter 13 of the GMP docs seems to be unapplicable in given circumstances.

So, my toy python module now uses custom allocation functions (realloc isn't required) for the GMP. I can't crash it on the CPython so far, but can on PyPy (maybe it's a PyPy issue).

The package is not feature-complete yet, but I still would appreciate testing.

skirpichev avatar Dec 02 '24 10:12 skirpichev

@casevh, I'm going to close this.

It seems that this proposal is doable, if we wrap every call to GMP/MPFR/MPC function with something like this: https://github.com/diofant/python-gmp/blob/4dd3802b14a8c62b9d06961b3ed1d69e8f372fc7/main.c#L2382-L2396 Though, I'm not sure it worth complications in code.

We could add a reference to the python-gmp project in https://gmpy2.readthedocs.io/en/latest/overview.html (discussing alternatives to crash), if you wish.

skirpichev avatar Jun 21 '25 14:06 skirpichev