pygame-ce icon indicating copy to clipboard operation
pygame-ce copied to clipboard

Change/Expand `Rect.collideobjectsall` for optimization

Open itzpr3d4t0r opened this issue 1 year ago • 9 comments
trafficstars

The current Rect.collideobjectsall() method is slow, like really slow, especially when compared with all the other multicollision methods: image

After analysis, it became evident that the bulk of the runtime is attributed to parsing and invoking the key argument, which only accepts a callable. Strikingly, this is also the slowest scenario:

from pygame import Rect

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(0, 0, 100, 100)
objs = [
   Obj(
            randint(-100, -100),
            randint(-100, 100),
            randint(-100, 100),
            randint(-100, 100),
        )
        for _ in range(100)
]

colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

Surprisingly, invoking a lambda function to access an object's attribute in C is markedly slower than passing the attribute's name as a string directly and retrieving the attribute.

Hence, I propose either of the following solutions:

  • Behaviour Change: Expand the key keyword argument to accept strings as well, enabling direct attribute retrieval in such instances.
  • Additional Behaviour: Introduce an extra keyword argument, from_attr, which exclusively accepts strings for direct attribute retrieval.

In practice, this would allow the following alternatives:

colliding_objs = r.collideobjectsall(objs, key='xa')
# OR
colliding_objs = r.collideobjectsall(objs, from_attr='xa')

Either of these would work and make it possible to massively optimize the algorithm. From my testing this change alone brings a 190% improvement on average: image

PS. this is also true for .collideobjects() since it shares most of this function's behaviour.

itzpr3d4t0r avatar Apr 02 '24 10:04 itzpr3d4t0r

Behaviour Change: Expand the key keyword argument to accept strings as well, enabling direct attribute retrieval in such instances.

I would be okay with this, as it wouldn't break existing code. Callables should continue to be handled the way they are, we are just adding an extra string handle path.

ankith26 avatar Apr 02 '24 13:04 ankith26

I'm skeptical of adding API surface like this for optimization. It's more branches, more code to test, more to maintain, and it provides no benefit to existing users. Especially since this is a niche function.

Plus, I've found a far easier solution--

Rewrite it in Python.

from pygame import Rect
import random
import time

random.seed(36)

use_python = False

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(-20, -20, 100, 100)
objs = [
   Obj(
            random.randint(-100, -100),
            random.randint(-100, 100),
            random.randint(-100, 100),
            random.randint(-100, 100),
        )
        for _ in range(5000)
]

start = time.time()

if use_python:
    for _ in range(1000):
        colliding_indices = r.collidelistall([obj.xa for obj in objs])
        colliding_objs = [objs[colliding_index] for colliding_index in colliding_indices]

else:
    for _ in range(1000):
        colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)

use_python = True does the same thing as collideobjectsall but is more than twice as fast. 0.14 seconds versus 0.34 seconds, Python 3.12.

Starbuck5 avatar Apr 03 '24 04:04 Starbuck5

Some thoughts:

  • Now that we added it in the C api why leave it a half baked and honestly bad implementation
  • People like fast stuff
  • People shouldn't have to circumvent our C limitations (that we put in place by implementing it) with a version dependant python function
  • It would have benefit with existing users since I would also implement faster looping for both cases
  • There is almost no cost test wise, just a handful of more lines.
  • Python 3.12 is cool but we can't expect people to both use it and implement our functions in python just because they're bad, since it would also imply people benchmarking code that 99% of people don't do at this scale.
  • I've seen projects like Isometria use this function in many places so we can't say it's that nieche... at least one big project uses it.

itzpr3d4t0r avatar Apr 03 '24 09:04 itzpr3d4t0r

use_python = True does the same thing as collideobjectsall but is more than twice as fast. 0.14 seconds versus 0.34 seconds, Python 3.12.

Your program on my machine (Python 3.12):

  • collideobjectsall with lambda: 0.3343813419342041 s
  • python way: 0.1553936004638672 s
  • collideobjectsall with string: 0.0875248908996582 s

So my proposed change is still 77.5% faster than the python alternative while not requiring more python.

itzpr3d4t0r avatar Apr 03 '24 12:04 itzpr3d4t0r

You don’t get it? We should replace our implementation of this function with a Python one. There’s no reason our implementation has to be written in C. Might be a little tricky, but we’ve done a couple similar things before.

This improves our implementation’s speed for existing code by taking a complex C function and rewriting it in 2 lines of Python. This would be a huge win.

It’s faster on 3.9 too, but it’s more faster on 3.12. And maybe even better with future versions!

Starbuck5 avatar Apr 03 '24 15:04 Starbuck5

Alright, Hopefully someone will do this eventually. Actually your idea of using collidelistall will get even faster with my PR: https://github.com/pygame-community/pygame-ce/pull/2786 so the future is bright for a python implementation. Now that we're on this, should i remove the optimization for collideobjects/collideobjectsall in this PR or no? I guess a ltl speedup with no drawbacks isn't really an issue but i'm not rly attacched to it or anything so it's whatever.

itzpr3d4t0r avatar Apr 03 '24 16:04 itzpr3d4t0r

Hmm, my example was unrealistic because I didn't show how it could be made compatible with the existing arguments. It didn't use the lambda function.

This is more realistic, but doesn't show nearly as good of a speedup:

from pygame import Rect
import random
import time

random.seed(36)

use_python = True

class Obj:
    def __init__(self, x, y, w, h):
        self.xa = Rect(x, y, w, h)

r = Rect(-20, -20, 100, 100)
objs = [
   Obj(
            random.randint(-100, -100),
            random.randint(-100, 100),
            random.randint(-100, 100),
            random.randint(-100, 100),
        )
        for _ in range(5000)
]

start = time.time()

def collideobjectsall(objects, key):
    colliding_indices = r.collidelistall([key(obj) for obj in objects])
    return [objects[colliding_index] for colliding_index in colliding_indices]

if use_python:
    for _ in range(1000):
        collideobjectsall(objs, key=lambda e: e.xa)

else:
    for _ in range(1000):
        colliding_objs = r.collideobjectsall(objs, key=lambda e: e.xa)

print(time.time() - start)

It is still faster to do it in Python in Python 3.12, but not in Python 3.9.

Python 3.9 "use_python"            0.36 s
Python 3.9 C implementation        0.31 s
Python 3.12 "use_python"           0.26 s
Python 3.12 C implementation       0.34 s

Also good point these will all be sped up with your optimization PR, even the Python version.

I'm bummed I didn't test it correctly and it's not as good as I thought, but I still think it's a good opportunity to pursue.

Starbuck5 avatar Apr 04 '24 02:04 Starbuck5

I mean I guess we could drop the python 3.12 version down to about 0.15 with the optimization but if optimizing C still gives us 0.08s of runtime AND it's python version independent then i think it's better to bite the bullet and implement the C version. I still think you're making it look worse than it actually is on the C side but these are my 2 cents anyway.

itzpr3d4t0r avatar Apr 04 '24 16:04 itzpr3d4t0r

I've managed to get a good improvement inside the current status quo here by using vectorcall: https://github.com/pygame-community/pygame-ce/pull/3048

Starbuck5 avatar Aug 11 '24 08:08 Starbuck5