cpython
cpython copied to clipboard
gh-102471: Add PyLong import and export API
Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT structure.
- Issue: gh-102471
📚 Documentation preview 📚: https://cpython-previews--121339.org.readthedocs.build/
cc @skirpichev @casevh
See also issue https://github.com/python/cpython/issues/111415
CC @tornaria, as Sage people might be interested in this feature.
CC @oscarbenjamin, you may want this for python-flint
I updated my PR:
- Use -1 for little endian and +1 for big endian.
- Rename PyUnstable_LongExport to PyUnstable_Long_DigitArray.
- Add "always succeed" mention in the doc.
CC @oscarbenjamin, you may want this for python-flint
Absolutely. Currently python-flint uses a hex-string as an intermediate format when converting between large int and fmpz so anything more direct is an improvement. I expect that python-flint would use mpz_import/mpz_export just like gmpy2.
@skirpichev: I added a PyLongWriter API similar to what @encukou proposed.
Example:
PyLongObject *
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits)
{
PyLongWriter *writer = PyLongWriter_Create();
if (writer == NULL) {
return NULL;
}
if (negative) {
PyLongWriter_SetSign(writer, -1);
}
Py_digit *writer_digits = PyLongWriter_AllocDigits(writer, digit_count);
if (writer_digits == NULL) {
goto error;
}
memcpy(writer_digits, digits, digit_count * sizeof(digit));
return (PyLongObject*)PyLongWriter_Finish(writer);
error:
PyLongWriter_Discard(writer);
return NULL;
}
The PyLongWriter_Finish() function normalizes the number and gets a small number if needed. Example:
>>> import _testcapi; _testcapi.pylong_import(0, [100, 0, 0]) is 100
True
I mark the PR as a draft until we agree on the API.
I added a PyLongWriter API similar to what @encukou proposed.
Yes, that looks better and should fix speed regression. I'll try to benchmark that, perhaps tomorrow.
But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in above proposal.
I updated the PR to remove the PyUnstable_ prefix, replace it with the PyLong prefix.
But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in https://github.com/python/cpython/pull/121339#discussion_r1665029240 proposal.
My concern is to avoid the problem https://github.com/capi-workgroup/api-evolution/issues/36 : avoid exposing _PyLong_New() object until it's fully initialized. The "writer" API hides the implementation details but also makes sure that the object is not "leaked" to Python before it's fully initialized and valid. By the way, the implementation uses functions which are only safe if the object cannot be seen in Python: if Py_REFCNT(obj) is 1.
@skirpichev: Would it be useful to add a PyLongWriter_SetValue(PyLongWriter *writer, long value) function? It would be similar to PyLong_FromLong(long value) (but may be less efficient), so I'm not sure if it's relevant.
This all seems a bit surprisingly complicated for what is needed here. What about having three functions:
PyLong_FromLimbs(plimbs, sign, num_limbs, limb_bits, bits_per_limb)
PyLong_ToLimbsGetSize(pylong, num_limbs, limb_bits, bits_per_limb)
PyLong_ToLimbs(plimbs, &sign, pylong, num_limbs, limb_bits, bits_per_limb)
Then PyLong_FromLimbs reads the limbs from plimbs and returns a PyLong. The PyLong_ToLimbs_GetSize function calculates how many limbs are needed so that you can allocate plimbs before calling PyLong_ToLimbs.
On a 64 bit machine for CPython format limb_bits would be 32 and bits_per_limb would be 30. For GMP both would be 64.
I can't imagine why CPython and e.g. GMP would use different endianness.
@skirpichev: I rewrote PyLongWriter API to make it simpler and avoid PyMem_Malloc/PyMem_Free():
PyLongObject *
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits)
{
Py_digit *writer_digits;
PyLongWriter *writer = PyLongWriter_Create(digit_count, &writer_digits);
if (writer == NULL) {
return NULL;
}
memcpy(writer_digits, digits, digit_count * sizeof(digit));
if (negative) {
PyLongWriter_SetNegative(writer);
}
return (PyLongObject*)PyLongWriter_Finish(writer);
}
PyLong_FromLimbs(plimbs, sign, num_limbs, limb_bits, bits_per_limb) PyLong_ToLimbs(plimbs, &sign, pylong, num_limbs, limb_bits, bits_per_limb)
I have no idea how to implement an import/export function which takes arbitrary limb_bits and bits_per_limb arguments. Python native format is 30-bit digit and 4 bytes per digit (or 15-bit digit and 2 bytes per digit).
My concern is to avoid the problem https://github.com/capi-workgroup/api-evolution/issues/36 : avoid exposing _PyLong_New() object until it's fully initialized.
But _PyLong_New() doesn't create of incompletely initialized object. Yes, it's a garbage - but a correct integer object. And in my proposal we don't expose this helper directly - it will be wrapped by PyLong_Import().
I worry that the writer API is overkill to solve slight performance regression. It happens for 2-10 digits, speed regression up to 20%. We have a quick path for small integers (long-sized) and for big ones (>~10) - creation of a temporary array is not a bottleneck.
Would it be useful to add a PyLongWriter_SetValue(PyLongWriter *writer, long value) function?
I doubt this will be more efficient than the current solution:
if (mpz_fits_slong_p(obj->z)) {
return PyLong_FromLong(mpz_get_si(obj->z));
}
I rewrote PyLongWriter API to make it simpler and avoid PyMem_Malloc/PyMem_Free():
Thanks, that's more closer to my proposal!
I doubt, that PyLongWriter_SetNegative() is useful: you can include sign in PyLongWriter_Create(). In this way you proposal will match mine (up to function names and a new struct:))
I have no idea how to implement an import/export function which takes arbitrary limb_bits and bits_per_limb arguments.
The limb_bits argument only needs to be 32 or 64. The bits_per_limb is almost always the same as limb_bits unless it is like CPython's format where limb_bits is 32 and bits_per_limb is 30.
I don't know of cases that actually need to have bits_per_limb different from limb_bits (apart from CPython) so maybe bits_per_limb is actually not needed and can be presumed equal to limb_bits. I think gmpy2 and python-flint only need to be able to do:
/* 64 bit build: */
pylong = PyLong_FromLimbs(plimbs, sign, num_limbs, 64, 64)
/* 32 bit build: */
pylong = PyLong_FromLimbs(plimbs, sign, num_limbs, 32, 32)
Maybe then all that is needed is 32 and 64 bit versions of the three functions and limb_bits and bits_per_limb can be dropped.
I doubt, that PyLongWriter_SetNegative() is useful: you can include sign in PyLongWriter_Create(). In this way you proposal will match mine (up to function names and a new struct:))
Ok, done.
The bits_per_limb is almost always the same as limb_bits unless it is like CPython's format
...or libtommath. It's like CPython's format, but... Not the same.
I have no idea how to implement an import/export function which takes arbitrary limb_bits and bits_per_limb arguments. Python native format is 30-bit digit and 4 bytes per digit (or 15-bit digit and 2 bytes per digit).
Yeah, we don't want to be writing conversions besides the one that we need for whatever our native format happens to be.
I like Victor's current proposal to provide a non-PyObject pointer (the "writer") and the direct memory to write into. If we also return the specification of our native format, then the caller can do the conversion to fill out our raw memory. (If you want faster code then start by assuming we won't change the format, hard code it, and use the returned format to assert that assumption is still correct.)
We need to balance the ability to efficiently do unusual things like this with our ability to make common things more efficient. This kind of API flies right in the face of that, so we want to be cautious about locking ourselves into a design that hurts everyone else.
I did some testing for current version (an update to this benchmarking):
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'
2000000 loops, best of 20: 111 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'
500000 loops, best of 20: 519 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'
100000 loops, best of 20: 2.44 usec per loop
Tests with same branch: https://github.com/skirpichev/gmpy/tree/trying-py-import-export
Apparently, speed is not an issue anymore: numbers are closer to my version (or equal). But the price is more big API. My proposal could be reduced to:
typedef struct PyLongLayout {
[...]
typedef struct PyLong_DigitArray {
[...]
/* Export as an array of digits */
PyAPI_FUNC(int) PyLong_Export(PyObject *obj, PyLong_DigitArray *array);
/* Prepare an array of digits with given sign and length */
PyAPI_FUNC(int) PyLong_Import(int negative, size_t ndigits, PyLong_DigitArray *array);
/* Cleanup export structure; common for reading & writing */
PyAPI_FUNC(void) PyLong_DigitArrayRelease(PyLong_DigitArray *array);
1 structure and 3 functions vs 2 and 5.
Naming not perfect, as usual) But I think this cover all practical cases for possible future changes in the PyLongObject structure, related to "small" integers.
Reading:
PyLong_DigitArray array;
Pylong_Export(obj, &array);
mpz_import(z, ..., array.digits);
PyLong_DigitArrayRelease(&array);
Writing:
PyLong_DigitArray array;
PyLong_Import(mpz_sgn(z) < 0, ndigits, &array);
mpz_export(array.digits, ..., z);
PyObject *result = array.obj;
Py_INCREF(result);
PyLong_DigitArrayRelease(&array);
Writing:
PyLong_DigitArray array; PyLong_Import(mpz_sgn(z) < 0, ndigits, &array); mpz_export(array.digits, ..., z); PyObject *result = array.obj; Py_INCREF(result); PyLong_DigitArrayRelease(&array);
This assumes that the internal structure of the PyLong is complete after the call to mpz_export and does not allow for the possibility to canonicalise the result into some different PyObject after seeing the output of mpz_export (maybe_small_long etc). There needs to be some function that returns result from array that can do this.
@oscarbenjamin, good point. I don't see a simple way to workaround this with 3 functions.
But still we need only one common structure like PyLong_DigitArray and 4 functions. Say:
/* Export */
PyAPI_FUNC(PyLong_DigitArray*) PyLong_AsDigits(PyObject *obj);
PyAPI_FUNC(void) PyLong_ReleaseDigits(PyLong_DigitArray *array);
/* Import */
PyAPI_FUNC(PyLong_DigitArray*) PyLong_CreateDigits(int negative, size_t ndigits);
PyAPI_FUNC(PyObject*) PyLong_FromDigits(PyLong_DigitArray *array);
BTW negative field of PyLong_DigitArray is merely a convenience, we have API to access that.
The trick that would get you the zero-copy that was asked for is for PyLong_DigitArray (in this example) to be the actual PyLongObject instance, just cast to a different type, and then if there's no reason to replace it with something else, PyLong_FromDigits just casts it to PyObject * and off you go. If one day we need to create a new object, we can do it without breaking anything except the performance expectations.
But it means an API more like:
PyIncompleteLong *PyLong_CreateWritable(int sign, size_t bits_required, void **digits, int *bits_per_digit, int *used_bits_per_digit, size_t *ndigit);
PyObject *PyIncompleteLong_Complete(PyIncompleteLong *v);
void PyIncompleteLong_Abort(PyIncompleteLong *v);
Then you call the first one with your total number of bits, we divide it by however many per digit we have, give you back a raw buffer, the number of bits per digit, and the number of digits. You export into that buffer however you like and give the incomplete object back to us and we give you a complete one.
Much the same API can be used for exporting as well. We need to keep the arbitrary/opaque object, and we need to keep the "I'm finished using it" function call, or else we can't abstract away the current/future memory management involved.
The trick that would get you the zero-copy that was asked for is for PyLong_DigitArray (in this example) to be the actual PyLongObject instance, just cast to a different type
That's exactly what I did with PyLongWriter API. The PyLongWriter structure doesn't exist, it's just the PyLongObject with a fake name.
Apparently, speed is not an issue anymore: numbers are closer to my version (or equal).
Great!
I wrote documentation for the PyLongWriter API with a small example (create an integer of a single digit).
The PyLongWriter structure doesn't exist, it's just the PyLongObject with a fake name.
Same is true for PyLong_DigitArray. It's essentially same structure.
PR updated:
- Remove
PyLong_Import()function: I kept it as an example in the doc since it's short enough, and IMO helps to understand the non-trivial API. - Make
PyLong_DigitArray.digitsread-only (const).
@serhiy-storchaka:
Then please used different names than PyLong_Import()/PyLong_Export().
Would you prefer that I rename PyLong_Export() to PyLong_AsDigitArray(), and PyLong_ReleaseExport() to PyLong_FreeDigitArray()?
I already removed PyLong_Import(), there are now PyLongWriter_Create()+PyLongWriter_Fininsh() instead.