pygame icon indicating copy to clipboard operation
pygame copied to clipboard

Improved collidepoint() function

Open itzpr3d4t0r opened this issue 3 years ago • 6 comments
trafficstars

Info

Follows #3324. One of #3337 Since collidepoint() could already only take 1 or two non keyword params. it was a no brainer to use FASTCALL to accelerate the function even more. With fastcall i could get a more efficient argument management while respecting the back compat. and performance. UPDATE This function now uses two(three actually) new C api functions to further speed up the function. Once the C api functions will be merged this one will be safe to merge.

Performance

EDIT. this is outdated The total performance improvement stands at 44.254% but whenever 2 numbers are passed separately, this value raises closer to 78.5%. In the case of CPython 3.6 i saw a net 3% performance reduction. Using the code below i got:

(coll = colliding, 10000000 Calls per test)
=============================================
pygame 2.1.3.dev5 (SDL 2.0.22, Python 3.10.4) <-- NEW IMPLEMENTATION

coll point:  0.6928618000019924
not coll point:  0.6933755999998539
coll point two:  0.48872839999967255
not coll point two:  0.483226200005447

Total: 2.358192000006966 s

=============================================
pygame 2.1.3.dev5 (SDL 2.0.22, Python 3.6.7) <-- NEW IMPLEMENTATION WITH PYTHON 3.6

coll point:  0.9762328
not coll point:  0.8990473000000001
coll point two:  0.9623647000000002
not coll point two:  0.9450940999999999

Total: 3.7827389000000005 s

=============================================
pygame 2.1.3.dev4 (SDL 2.0.20, Python 3.10.4)

coll point:  0.8361166000031517
not coll point:  0.8339784999989206
coll point two:  0.872635599997011
not coll point two:  0.8590755999975954

Total: 3.4018062999966787 s

=============================================
pygame 2.1.3.dev4 (SDL 2.0.20, Python 3.6.7)

coll point:  0.9085874
not coll point:  0.9017777
coll point two:  0.9342617000000002
not coll point two:  0.9230961

Total: 3.6677229000000002 s

The code i used:

import pygame

from plotting import test_group

pygame.init()

tottime = 0
r1 = pygame.rect.Rect(0, 0, 10, 10)

tup = (5, 5)
lst = [5, 5]

ntup = (30, 30)
nlst = [30, 30]

vec1 = pygame.math.Vector2(5, 5)
vec2 = pygame.math.Vector2(30, 30)

NUMBER = 10000000
GLOB = {"r1": r1, "ntup": ntup, "nlst": nlst, "tup": tup, "lst": lst,
        "vec1": vec1, "vec2": vec2}

test_names = (
    "tup inside tup", "coll tup", "not coll tup", "coll list", "not coll list",
    "coll 2 args", "not coll 2 args", "coll unpack tuple 1->2",
    "not coll unpack tuple 1->2", "coll unpack list 1->2",
    "not coll unpack list 1->2")

test_group(
    "Sequences", [
        "r1.collidepoint(((5.0, 5.0), ))",
        "r1.collidepoint((5.0, 5.0))",
        "r1.collidepoint((30.0, 30.0))",
        "r1.collidepoint([5.0, 5.0])",
        "r1.collidepoint([30.0, 30.0])",
        "r1.collidepoint(5, 5)",
        "r1.collidepoint(30, 30)",
        "r1.collidepoint(*tup)",
        "r1.collidepoint(*ntup)",
        "r1.collidepoint(*lst)",
        "r1.collidepoint(*nlst)"
    ],
    GLOB,
    test_names
)

test_group(
    "Vectors", [
        "r1.collidepoint(vec1)",
        "r1.collidepoint(vec2)"
    ],
    GLOB,
    ("coll vector", "not coll vector")
)

Where test_group() is

def test(test_name: str, func: str, globs: dict, num: int,
         mode: str = "mean") -> float:
    value = 0
    if mode == "mean":
        value = fmean(timeit.repeat(func, globals=globs, number=num))
    elif mode == "cumulative":
        value = timeit.timeit(func, globals=globs, number=num)
    else:
        raise NotImplementedError("Incorrect test mode")
    print("-) " + test_name + f": {round(value, 8)} s")
    return value


def test_group(group_name: str, sequence: list[str],
               globs: dict,
               tests_names: tuple[str, ...] = None, num: int = 10_000_000,
               include_tot: bool = False, include_mean: bool = True,
               test_mode: str = "mean") -> float:
    print("\n====||" + group_name.upper() + "||====")
    
    test_data = [
        test(tests_names[i] if tests_names else str(i + 1), test_obj, globs,
             num, test_mode)
        for
        i, test_obj in
        enumerate(sequence)]
    
    print("______________")
    if include_tot:
        print(f"-) Total time: {round(sum(test_data), 8)}")
    
    if include_mean:
        print(f"-) Mean time: {round(fmean(test_data), 8)}")
    
    return fmean(test_data)

itzpr3d4t0r avatar Jul 19 '22 18:07 itzpr3d4t0r

Putting out the new test that shows improved performance with 2 arguments:

coll point:  0.7030235000001994
not coll point:  0.7181456999996954
coll point two:  0.4383071000002019
not coll point two:  0.4396271000005072

Total: 2.299103400000604 s

itzpr3d4t0r avatar Jul 23 '22 11:07 itzpr3d4t0r

I expanded my test suite for more cases and got these results:

NEW

coll point:  0.7149117999997543
not coll point:  0.7133573000000979
coll point lst:  1.0290474000003087
not coll point lst:  1.0237922999999682
coll point two:  0.46079980000013165
not coll point two:  0.44774779999988823
coll one->two tup:  0.6896802000001117
not coll one->two tup:  0.6725584999999228
coll one->two lst:  1.0300710000001345
not coll one->two lst:  0.9872267999999167
coll vector:  0.7621639999997569
not coll vector:  0.7608357000003707

Total: 9.292192600000362 s

OLD

coll point:  0.8415466999999808
not coll point:  0.8336511000002247
coll point lst:  1.1184192000000621
not coll point lst:  1.114143900000272
coll point two:  0.8624654999998711
not coll point two:  0.8546617000001788
coll one->two tup:  0.9430790000001252
not coll one->two tup:  0.9474781000003532
coll one->two lst:  1.315905500000099
not coll one->two lst:  1.3939691999999013
coll vector:  0.9540288999996847
not coll vector:  0.9230023000000074

Total: 12.10235110000076 s

itzpr3d4t0r avatar Jul 24 '22 11:07 itzpr3d4t0r

Latest test (latest commit improved performance of an additional 2.7% and cleaned up the code)

coll point:  0.6911872000000585
not coll point:  0.6898155000017141
coll point lst:  0.9968098999997892
not coll point lst:  0.9937802999993437
coll point two:  0.4360297000002902
not coll point two:  0.4360524000003352
coll one->two tup:  0.665919600000052
not coll one->two tup:  0.6680467999995017
coll one->two lst:  1.0285985999998957
not coll one->two lst:  0.9847864000003028
coll vector:  0.7158532000012201
not coll vector:  0.7405311999991682

Total: 9.047410800001671 s

itzpr3d4t0r avatar Jul 25 '22 11:07 itzpr3d4t0r

I recently added 2 new C API functions for getting an int from an object (pg_F_IntFromObj) and two ints from a sequence (pg_F_TwoIntsFromObj). I really hope these two function can help others build faster implementations as these functions do set errors, making code cleaner, as the user won't have to set errors themselves. These two are intended to be faster variants of their counterpart. I also added docs to base.rst. What i thought is instead of having to micro optimize already existing functions, that are a pain to edit as they're used everywhere, we should build newer API (specialized if you wish) that we can implement slowly and only where needed, instead of changing a letter in a function, and in the end surrender because there's noting you can do without breaking something else.

I changed the code i use to test so go up to the first comment and see the change. Performance appears to be massively better with the new functions, but the vector case still needs to be addressed. Right now there's no way to check if an object is a vector in rect.c(as pgVector2_Check instantly segfaults if used), and it appears that using PySequence_Fast on a vector makes getting its items slower, this needs addressing. You can notice from the tests how using the old pg_IntFromObj and pg_TwoIntsFromObj makes things worse (in some cases 40+% slower), but makes vector point collision faster.

OLD

====||SEQUENCES||====
-) tup inside tup: 0.79267568 s
-) coll tup: 0.78041336 s
-) not coll tup: 0.75268818 s
-) coll list: 1.00681966 s
-) not coll list: 1.0205569 s
-) coll 2 args: 0.8610839 s
-) not coll 2 args: 0.85777856 s
-) coll unpack tuple 1->2: 0.94632598 s
-) not coll unpack tuple 1->2: 0.95737914 s
-) coll unpack list 1->2: 1.3087718 s
-) not coll unpack list 1->2: 1.3027346 s
______________
-) Mean time: 0.96247525

====||VECTORS||====
-) coll vector: 0.94497824 s
-) not coll vector: 0.90481702 s
______________
-) Mean time: 0.92489763

NEW

====||SEQUENCES||====
-) tup inside tup: 0.45295524 s
-) coll tup: 0.42320544 s
-) not coll tup: 0.42256632 s
-) coll list: 0.63413222 s
-) not coll list: 0.6435943 s
-) coll 2 args: 0.48680894 s
-) not coll 2 args: 0.49668356 s
-) coll unpack tuple 1->2: 0.72882458 s
-) not coll unpack tuple 1->2: 0.72997988 s
-) coll unpack list 1->2: 1.05092198 s
-) not coll unpack list 1->2: 1.05519052 s
______________
-) Mean time: 0.64771482

====||VECTORS||====
-) coll vector: 1.74404342 s
-) not coll vector: 1.7378666 s
______________
-) Mean time: 1.74095501

WITH pg_TwoIntsFromObj() and pg_IntFromObj() following "slower" tags are compared to my new implementation

====||SEQUENCES||====
-) tup inside tup: 0.64325502 s -> 42% slower
-) coll tup: 0.62990432 s -> 48.8% slower
-) not coll tup: 0.6378526 s -> 50.9% slower
-) coll list: 0.89865978 s -> 41.7% slower
-) not coll list: 0.88662296 s -> 37.7% slower
-) coll 2 args: 0.48267244 s -> 0.8% faster
-) not coll 2 args: 0.48508974 s -> 2.3% faster
-) coll unpack tuple 1->2: 0.76765524 s -> 5.3% slower
-) not coll unpack tuple 1->2: 0.77036818 s -> 5.5% slower
-) coll unpack list 1->2: 1.14468948 s -> 8.9% slower
-) not coll unpack list 1->2: 1.14377396 s -> 8.4% slower
______________
-) Mean time: 0.77186761

====||VECTORS||====
-) coll vector: 0.72549384 s -> 140% faster
-) not coll vector: 0.72421628 s -> 139.9% faster
______________
-) Mean time: 0.72485506

itzpr3d4t0r avatar Jul 29 '22 12:07 itzpr3d4t0r

Latest results:

====||SEQUENCES||====
-) tup inside tup: 0.50052934 s
-) coll tup: 0.4147287 s
-) not coll tup: 0.41854068 s
-) coll list: 0.68230462 s
-) not coll list: 0.67322238 s
-) coll 2 args: 0.54443772 s
-) not coll 2 args: 0.5374269 s
-) coll unpack tuple 1->2: 0.75491 s
-) not coll unpack tuple 1->2: 0.7572202 s
-) coll unpack list 1->2: 1.0717686 s
-) not coll unpack list 1->2: 1.0533608 s
______________
-) Mean time: 0.67349545

====||VECTORS||====
-) coll vector: 0.63746644 s
-) not coll vector: 0.65859866 s
______________
-) Mean time: 0.64803255

itzpr3d4t0r avatar Jul 30 '22 22:07 itzpr3d4t0r

Latest test: NEW WITH FLOAT POSITIONS

====||SEQUENCES||====
-) tup inside tup: 0.38056136 s
-) coll tup: 0.36971348 s
-) not coll tup: 0.35821488 s
-) coll list: 0.57929936 s
-) not coll list: 0.58612918 s
-) coll 2 args: 0.3802126 s
-) not coll 2 args: 0.37733526 s
-) coll unpack tuple 1->2: 0.6522455 s
-) not coll unpack tuple 1->2: 0.64988396 s
-) coll unpack list 1->2: 0.95453122 s
-) not coll unpack list 1->2: 0.95682856 s
______________
-) Mean time: 0.56772321

====||VECTORS||====
-) coll vector: 0.57349746 s
-) not coll vector: 0.5546095 s
______________
-) Mean time: 0.56405348

NEW WITH INT POSITIONS

====||SEQUENCES||====
-) tup inside tup: 0.53279938 s
-) coll tup: 0.52770514 s
-) not coll tup: 0.51710542 s
-) coll list: 0.7346576 s
-) not coll list: 0.76284728 s
-) coll 2 args: 0.52968448 s
-) not coll 2 args: 0.5111041 s
-) coll unpack tuple 1->2: 0.78398026 s
-) not coll unpack tuple 1->2: 0.77946974 s
-) coll unpack list 1->2: 1.1492395 s
-) not coll unpack list 1->2: 1.11902564 s
______________
-) Mean time: 0.72251078

====||VECTORS||====
-) coll vector: 0.5659497 s
-) not coll vector: 0.56177544 s
______________
-) Mean time: 0.56386257

OLD WITH FLOAT POSITIONS

====||SEQUENCES||====
-) tup inside tup: 0.80344388 s
-) coll tup: 0.7880568 s
-) not coll tup: 0.78416768 s
-) coll list: 1.02144924 s
-) not coll list: 1.07143616 s
-) coll 2 args: 0.81639486 s
-) not coll 2 args: 0.86128652 s
-) coll unpack tuple 1->2: 0.91880574 s
-) not coll unpack tuple 1->2: 0.87669942 s
-) coll unpack list 1->2: 1.25596554 s
-) not coll unpack list 1->2: 1.25042126 s
______________
-) Mean time: 0.94982974

====||VECTORS||====
-) coll vector: 0.95305282 s
-) not coll vector: 0.92477894 s
______________
-) Mean time: 0.93891588

OLD WITH INT POSITIONS

====||SEQUENCES||====
-) tup inside tup: 0.87435766 s
-) coll tup: 0.8626543 s
-) not coll tup: 0.84990578 s
-) coll list: 1.12899522 s
-) not coll list: 1.1526588 s
-) coll 2 args: 0.88005104 s
-) not coll 2 args: 0.86773914 s
-) coll unpack tuple 1->2: 0.96889162 s
-) not coll unpack tuple 1->2: 0.96902186 s
-) coll unpack list 1->2: 1.34192146 s
-) not coll unpack list 1->2: 1.31817322 s
______________
-) Mean time: 1.01948819

====||VECTORS||====
-) coll vector: 0.94579784 s
-) not coll vector: 0.9231809 s
______________
-) Mean time: 0.93448937

itzpr3d4t0r avatar Jul 31 '22 20:07 itzpr3d4t0r

Latest test with floats:

====||SEQUENCES||====
-) tup inside tup: 0.64257536 s
-) coll tup: 0.6259122 s
-) not coll tup: 0.64952376 s
-) coll list: 0.92240678 s
-) not coll list: 0.92524146 s
-) coll 2 args: 0.43677754 s
-) not coll 2 args: 0.44115426 s
-) coll unpack tuple 1->2: 0.68411928 s
-) not coll unpack tuple 1->2: 0.6728108 s
-) coll unpack list 1->2: 1.07899142 s
-) not coll unpack list 1->2: 1.07165674 s
______________
-) Mean time: 0.74101542

====||VECTORS||====
-) coll vector: 0.7786252 s
-) not coll vector: 0.76347446 s
______________
-) Mean time: 0.77104983

itzpr3d4t0r avatar Dec 11 '22 12:12 itzpr3d4t0r