pygame-ce
pygame-ce copied to clipboard
Use PyObject_Vectorcall in rect
This speeds up Rect.collideobjects and Rect.collideobjectsall.
See https://docs.python.org/3/c-api/call.html#c.PyObject_Vectorcall
This function is new in Python 3.9, but we get support for it on lower versions automatically using the recently vendored pythoncapi-compat header.
Performance testing script:
from pygame import Rect
import random
import time
random.seed(36)
class Obj:
def __init__(self, x, y, w, h):
self.xa = Rect(x, y, w, h)
r = Rect(-20, -20, 100, 100)
objs = [
Obj(
random.randint(-100, -100),
random.randint(-100, 100),
random.randint(-100, 100),
random.randint(-100, 100),
)
for _ in range(5000)
]
start = time.time()
for _ in range(1000):
colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)
print(time.time() - start)
print(len(colliding_objs))
# Sort list so rects that actually collide are at the end of the list,
# so collideobjects below takes significant time.
objs.sort(key=lambda e: int(r.colliderect(e.xa)))
start = time.time()
for _ in range(1000):
colliding_obj = r.collideobjects(objs, key=lambda e: e.xa)
print(time.time() - start)
print(colliding_obj.xa)
I see collideobjectsall going from 0.32 seconds to 0.25 seconds (22% improvement), and collideobjects going from 0.26 seconds to 0.22 seconds (15% improvement).
That's a pretty good improvement given this PR only changes one C-API function call!
I believe #3023 already covers the performance gain that can be achieved with this PR, while also using nicer API (imo).
Even if we need to support limited API in the future, it is trivial to slap in something like #define PyObject_CallOneArg(self, obj) PyObject_Vectorcall((self), &(obj), 1, NULL) in one of the headers
So I'm annoyed that the Python docs didn't mention that. Yeah I check the source and yeah CallOneArg uses vectorcall.
You mentioned performance as an afterthought in that PR but didn't put much weight on it. This PR at least gives you a benchmark of performance gain from your PR in one area.
CallOneArg:
PyObject *
PyObject_CallOneArg(PyObject *func, PyObject *arg)
{
EVAL_CALL_STAT_INC_IF_FUNCTION(EVAL_CALL_API, func);
assert(arg != NULL);
PyObject *_args[2];
PyObject **args = _args + 1; // For PY_VECTORCALL_ARGUMENTS_OFFSET
args[0] = arg;
PyThreadState *tstate = _PyThreadState_GET();
size_t nargsf = 1 | PY_VECTORCALL_ARGUMENTS_OFFSET;
return _PyObject_VectorcallTstate(tstate, func, args, nargsf, NULL);
}
Vectorcall:
PyObject *
PyObject_Vectorcall(PyObject *callable, PyObject *const *args,
size_t nargsf, PyObject *kwnames)
{
PyThreadState *tstate = _PyThreadState_GET();
return _PyObject_VectorcallTstate(tstate, callable,
args, nargsf, kwnames);
}
Vectorcall is a tiny bit less indirect and so would likely be faster, but probably not measurably. (I don't think PY_VECTORCALL_ARGUMENTS_OFFSET will help our use case here, we call full functions and not bound methods, which was the example given in the pep about this optimization).