pygame-ce Use PyObject_Vectorcall in rect

trafficstars

This speeds up Rect.collideobjects and Rect.collideobjectsall.

See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall

This function is new in Python 3.9, but we get support for it on lower versions automatically using the recently vendored pythoncapi-compat header.

Performance testing script:

from pygame import Rect
import random
import time

random.seed(36)


class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)


r = Rect(-20, -20, 100, 100)
objs = [
    Obj(
        random.randint(-100, -100),
        random.randint(-100, 100),
        random.randint(-100, 100),
        random.randint(-100, 100),
    )
    for _ in range(5000)
]

start = time.time()

for _ in range(1000):
    colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)
print(len(colliding_objs))

# Sort list so rects that actually collide are at the end of the list,
# so collideobjects below takes significant time.
objs.sort(key=lambda e: int(r.colliderect(e.xa)))

start = time.time()

for _ in range(1000):
    colliding_obj = r.collideobjects(objs, key=lambda e: e.xa)

print(time.time() - start)
print(colliding_obj.xa)

I see collideobjectsall going from 0.32 seconds to 0.25 seconds (22% improvement), and collideobjects going from 0.26 seconds to 0.22 seconds (15% improvement).

That's a pretty good improvement given this PR only changes one C-API function call!

Aug 11 '24 07:08 Starbuck5

I believe #3023 already covers the performance gain that can be achieved with this PR, while also using nicer API (imo).

Even if we need to support limited API in the future, it is trivial to slap in something like #define PyObject_CallOneArg(self, obj) PyObject_Vectorcall((self), &(obj), 1, NULL) in one of the headers

Aug 12 '24 13:08 ankith26

So I'm annoyed that the Python docs didn't mention that. Yeah I check the source and yeah CallOneArg uses vectorcall.

You mentioned performance as an afterthought in that PR but didn't put much weight on it. This PR at least gives you a benchmark of performance gain from your PR in one area.

CallOneArg:

PyObject *
PyObject_CallOneArg(PyObject *func, PyObject *arg)
{
    EVAL_CALL_STAT_INC_IF_FUNCTION(EVAL_CALL_API, func);
    assert(arg != NULL);
    PyObject *_args[2];
    PyObject **args = _args + 1;  // For PY_VECTORCALL_ARGUMENTS_OFFSET
    args[0] = arg;
    PyThreadState *tstate = _PyThreadState_GET();
    size_t nargsf = 1 | PY_VECTORCALL_ARGUMENTS_OFFSET;
    return _PyObject_VectorcallTstate(tstate, func, args, nargsf, NULL);
}

Vectorcall:

PyObject *
PyObject_Vectorcall(PyObject *callable, PyObject *const *args,
                     size_t nargsf, PyObject *kwnames)
{
    PyThreadState *tstate = _PyThreadState_GET();
    return _PyObject_VectorcallTstate(tstate, callable,
                                      args, nargsf, kwnames);
}

Vectorcall is a tiny bit less indirect and so would likely be faster, but probably not measurably. (I don't think PY_VECTORCALL_ARGUMENTS_OFFSET will help our use case here, we call full functions and not bound methods, which was the example given in the pep about this optimization).

Aug 13 '24 06:08 Starbuck5

pygame-ce pygame-ce copied to clipboard

Use PyObject_Vectorcall in rect

pygame-ce
pygame-ce copied to clipboard