cpython gh-102471: Add PyLong import and export API

trafficstars

Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT structure.

Issue: gh-102471

📚 Documentation preview 📚: https://cpython-previews--121339.org.readthedocs.build/

Jul 03 '24 15:07 vstinner

cc @skirpichev @casevh

Jul 03 '24 15:07 vstinner

See also issue https://github.com/python/cpython/issues/111415

Jul 03 '24 20:07 vstinner

CC @tornaria, as Sage people might be interested in this feature.

Jul 04 '24 03:07 skirpichev

CC @oscarbenjamin, you may want this for python-flint

Jul 04 '24 05:07 skirpichev

I updated my PR:

Use -1 for little endian and +1 for big endian.
Rename PyUnstable_LongExport to PyUnstable_Long_DigitArray.
Add "always succeed" mention in the doc.

Jul 04 '24 07:07 vstinner

CC @oscarbenjamin, you may want this for python-flint

Absolutely. Currently python-flint uses a hex-string as an intermediate format when converting between large int and fmpz so anything more direct is an improvement. I expect that python-flint would use mpz_import/mpz_export just like gmpy2.

Jul 04 '24 11:07 oscarbenjamin

@skirpichev: I added a PyLongWriter API similar to what @encukou proposed.

Example:

PyLongObject *
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits)
{
    PyLongWriter *writer = PyLongWriter_Create();
    if (writer == NULL) {
        return NULL;
    }

    if (negative) {
        PyLongWriter_SetSign(writer, -1);
    }

    Py_digit *writer_digits = PyLongWriter_AllocDigits(writer, digit_count);
    if (writer_digits == NULL) {
        goto error;
    }
    memcpy(writer_digits, digits, digit_count * sizeof(digit));

    return (PyLongObject*)PyLongWriter_Finish(writer);

error:
    PyLongWriter_Discard(writer);
    return NULL;
}

The PyLongWriter_Finish() function normalizes the number and gets a small number if needed. Example:

>>> import _testcapi; _testcapi.pylong_import(0, [100, 0, 0]) is 100
True

Jul 04 '24 12:07 vstinner

I mark the PR as a draft until we agree on the API.

Jul 04 '24 13:07 vstinner

I added a PyLongWriter API similar to what @encukou proposed.

Yes, that looks better and should fix speed regression. I'll try to benchmark that, perhaps tomorrow.

But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in above proposal.

Jul 04 '24 13:07 skirpichev

I updated the PR to remove the PyUnstable_ prefix, replace it with the PyLong prefix.

Jul 04 '24 13:07 vstinner

But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in https://github.com/python/cpython/pull/121339#discussion_r1665029240 proposal.

My concern is to avoid the problem https://github.com/capi-workgroup/api-evolution/issues/36 : avoid exposing _PyLong_New() object until it's fully initialized. The "writer" API hides the implementation details but also makes sure that the object is not "leaked" to Python before it's fully initialized and valid. By the way, the implementation uses functions which are only safe if the object cannot be seen in Python: if Py_REFCNT(obj) is 1.

Jul 04 '24 13:07 vstinner

@skirpichev: Would it be useful to add a PyLongWriter_SetValue(PyLongWriter *writer, long value) function? It would be similar to PyLong_FromLong(long value) (but may be less efficient), so I'm not sure if it's relevant.

Jul 04 '24 13:07 vstinner

This all seems a bit surprisingly complicated for what is needed here. What about having three functions:

PyLong_FromLimbs(plimbs, sign, num_limbs, limb_bits, bits_per_limb)
PyLong_ToLimbsGetSize(pylong, num_limbs, limb_bits, bits_per_limb)
PyLong_ToLimbs(plimbs, &sign, pylong, num_limbs, limb_bits, bits_per_limb)

Then PyLong_FromLimbs reads the limbs from plimbs and returns a PyLong. The PyLong_ToLimbs_GetSize function calculates how many limbs are needed so that you can allocate plimbs before calling PyLong_ToLimbs.

On a 64 bit machine for CPython format limb_bits would be 32 and bits_per_limb would be 30. For GMP both would be 64.

I can't imagine why CPython and e.g. GMP would use different endianness.

Jul 04 '24 14:07 oscarbenjamin

@skirpichev: I rewrote PyLongWriter API to make it simpler and avoid PyMem_Malloc/PyMem_Free():

PyLongObject *
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits)
{
    Py_digit *writer_digits;
    PyLongWriter *writer = PyLongWriter_Create(digit_count, &writer_digits);
    if (writer == NULL) {
        return NULL;
    }
    memcpy(writer_digits, digits, digit_count * sizeof(digit));
    if (negative) {
        PyLongWriter_SetNegative(writer);
    }

    return (PyLongObject*)PyLongWriter_Finish(writer);
}

Jul 04 '24 14:07 vstinner

PyLong_FromLimbs(plimbs, sign, num_limbs, limb_bits, bits_per_limb) PyLong_ToLimbs(plimbs, &sign, pylong, num_limbs, limb_bits, bits_per_limb)

I have no idea how to implement an import/export function which takes arbitrary limb_bits and bits_per_limb arguments. Python native format is 30-bit digit and 4 bytes per digit (or 15-bit digit and 2 bytes per digit).

Jul 04 '24 14:07 vstinner

My concern is to avoid the problem https://github.com/capi-workgroup/api-evolution/issues/36 : avoid exposing _PyLong_New() object until it's fully initialized.

But _PyLong_New() doesn't create of incompletely initialized object. Yes, it's a garbage - but a correct integer object. And in my proposal we don't expose this helper directly - it will be wrapped by PyLong_Import().

I worry that the writer API is overkill to solve slight performance regression. It happens for 2-10 digits, speed regression up to 20%. We have a quick path for small integers (long-sized) and for big ones (>~10) - creation of a temporary array is not a bottleneck.

Would it be useful to add a PyLongWriter_SetValue(PyLongWriter *writer, long value) function?

I doubt this will be more efficient than the current solution:

    if (mpz_fits_slong_p(obj->z)) {
        return PyLong_FromLong(mpz_get_si(obj->z));
    }

I rewrote PyLongWriter API to make it simpler and avoid PyMem_Malloc/PyMem_Free():

Thanks, that's more closer to my proposal!

I doubt, that PyLongWriter_SetNegative() is useful: you can include sign in PyLongWriter_Create(). In this way you proposal will match mine (up to function names and a new struct:))

Jul 04 '24 14:07 skirpichev

I have no idea how to implement an import/export function which takes arbitrary limb_bits and bits_per_limb arguments.

The limb_bits argument only needs to be 32 or 64. The bits_per_limb is almost always the same as limb_bits unless it is like CPython's format where limb_bits is 32 and bits_per_limb is 30.

I don't know of cases that actually need to have bits_per_limb different from limb_bits (apart from CPython) so maybe bits_per_limb is actually not needed and can be presumed equal to limb_bits. I think gmpy2 and python-flint only need to be able to do:

/* 64 bit build: */
pylong = PyLong_FromLimbs(plimbs, sign, num_limbs, 64, 64)
/* 32 bit build: */
pylong = PyLong_FromLimbs(plimbs, sign, num_limbs, 32, 32)

Maybe then all that is needed is 32 and 64 bit versions of the three functions and limb_bits and bits_per_limb can be dropped.

Jul 04 '24 14:07 oscarbenjamin

I doubt, that PyLongWriter_SetNegative() is useful: you can include sign in PyLongWriter_Create(). In this way you proposal will match mine (up to function names and a new struct:))

Ok, done.

Jul 04 '24 14:07 vstinner

The bits_per_limb is almost always the same as limb_bits unless it is like CPython's format

...or libtommath. It's like CPython's format, but... Not the same.

Jul 04 '24 14:07 skirpichev

I have no idea how to implement an import/export function which takes arbitrary limb_bits and bits_per_limb arguments. Python native format is 30-bit digit and 4 bytes per digit (or 15-bit digit and 2 bytes per digit).

Yeah, we don't want to be writing conversions besides the one that we need for whatever our native format happens to be.

I like Victor's current proposal to provide a non-PyObject pointer (the "writer") and the direct memory to write into. If we also return the specification of our native format, then the caller can do the conversion to fill out our raw memory. (If you want faster code then start by assuming we won't change the format, hard code it, and use the returned format to assert that assumption is still correct.)

We need to balance the ability to efficiently do unusual things like this with our ability to make common things more efficient. This kind of API flies right in the face of that, so we want to be cautious about locking ourselves into a design that hurts everyone else.

Jul 04 '24 15:07 zooba

I did some testing for current version (an update to this benchmarking):

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'
2000000 loops, best of 20: 111 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'
500000 loops, best of 20: 519 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'
100000 loops, best of 20: 2.44 usec per loop

Tests with same branch: https://github.com/skirpichev/gmpy/tree/trying-py-import-export

Apparently, speed is not an issue anymore: numbers are closer to my version (or equal). But the price is more big API. My proposal could be reduced to:

typedef struct PyLongLayout {
    [...]
typedef struct PyLong_DigitArray {
    [...]
/* Export as an array of digits */
PyAPI_FUNC(int) PyLong_Export(PyObject *obj, PyLong_DigitArray *array);
/* Prepare an array of digits with given sign and length */
PyAPI_FUNC(int) PyLong_Import(int negative, size_t ndigits, PyLong_DigitArray *array);
/* Cleanup export structure; common for reading & writing */
PyAPI_FUNC(void) PyLong_DigitArrayRelease(PyLong_DigitArray *array);

1 structure and 3 functions vs 2 and 5.

Naming not perfect, as usual) But I think this cover all practical cases for possible future changes in the PyLongObject structure, related to "small" integers.

Reading:

PyLong_DigitArray array;
Pylong_Export(obj, &array);
mpz_import(z, ..., array.digits);
PyLong_DigitArrayRelease(&array);

Writing:

PyLong_DigitArray array;
PyLong_Import(mpz_sgn(z) < 0, ndigits, &array);
mpz_export(array.digits, ..., z);
PyObject *result = array.obj;
Py_INCREF(result);
PyLong_DigitArrayRelease(&array);

Jul 04 '24 16:07 skirpichev

Writing:

PyLong_DigitArray array;
PyLong_Import(mpz_sgn(z) < 0, ndigits, &array);
mpz_export(array.digits, ..., z);
PyObject *result = array.obj;
Py_INCREF(result);
PyLong_DigitArrayRelease(&array);

This assumes that the internal structure of the PyLong is complete after the call to mpz_export and does not allow for the possibility to canonicalise the result into some different PyObject after seeing the output of mpz_export (maybe_small_long etc). There needs to be some function that returns result from array that can do this.

Jul 04 '24 17:07 oscarbenjamin

@oscarbenjamin, good point. I don't see a simple way to workaround this with 3 functions.

But still we need only one common structure like PyLong_DigitArray and 4 functions. Say:

/* Export */
PyAPI_FUNC(PyLong_DigitArray*) PyLong_AsDigits(PyObject *obj);
PyAPI_FUNC(void) PyLong_ReleaseDigits(PyLong_DigitArray *array);
/* Import */
PyAPI_FUNC(PyLong_DigitArray*) PyLong_CreateDigits(int negative, size_t ndigits);
PyAPI_FUNC(PyObject*) PyLong_FromDigits(PyLong_DigitArray *array);

BTW negative field of PyLong_DigitArray is merely a convenience, we have API to access that.

Jul 04 '24 17:07 skirpichev

The trick that would get you the zero-copy that was asked for is for PyLong_DigitArray (in this example) to be the actual PyLongObject instance, just cast to a different type, and then if there's no reason to replace it with something else, PyLong_FromDigits just casts it to PyObject * and off you go. If one day we need to create a new object, we can do it without breaking anything except the performance expectations.

But it means an API more like:

PyIncompleteLong *PyLong_CreateWritable(int sign, size_t bits_required, void **digits, int *bits_per_digit, int *used_bits_per_digit, size_t *ndigit);
PyObject *PyIncompleteLong_Complete(PyIncompleteLong *v);
void PyIncompleteLong_Abort(PyIncompleteLong *v);

Then you call the first one with your total number of bits, we divide it by however many per digit we have, give you back a raw buffer, the number of bits per digit, and the number of digits. You export into that buffer however you like and give the incomplete object back to us and we give you a complete one.

Much the same API can be used for exporting as well. We need to keep the arbitrary/opaque object, and we need to keep the "I'm finished using it" function call, or else we can't abstract away the current/future memory management involved.

Jul 04 '24 18:07 zooba

The trick that would get you the zero-copy that was asked for is for PyLong_DigitArray (in this example) to be the actual PyLongObject instance, just cast to a different type

That's exactly what I did with PyLongWriter API. The PyLongWriter structure doesn't exist, it's just the PyLongObject with a fake name.

Jul 04 '24 18:07 vstinner

Apparently, speed is not an issue anymore: numbers are closer to my version (or equal).

Great!

Jul 04 '24 19:07 vstinner

I wrote documentation for the PyLongWriter API with a small example (create an integer of a single digit).

Jul 04 '24 20:07 vstinner

The PyLongWriter structure doesn't exist, it's just the PyLongObject with a fake name.

Same is true for PyLong_DigitArray. It's essentially same structure.

Jul 05 '24 00:07 skirpichev

PR updated:

Remove PyLong_Import() function: I kept it as an example in the doc since it's short enough, and IMO helps to understand the non-trivial API.
Make PyLong_DigitArray.digits read-only (const).

Jul 05 '24 07:07 vstinner

@serhiy-storchaka:

Then please used different names than PyLong_Import()/PyLong_Export().

Would you prefer that I rename PyLong_Export() to PyLong_AsDigitArray(), and PyLong_ReleaseExport() to PyLong_FreeDigitArray()?

I already removed PyLong_Import(), there are now PyLongWriter_Create()+PyLongWriter_Fininsh() instead.

Jul 05 '24 08:07 vstinner

cpython cpython copied to clipboard

gh-102471: Add PyLong import and export API

cpython
cpython copied to clipboard