pyyaml icon indicating copy to clipboard operation
pyyaml copied to clipboard

Speed up creating Mark objects in the Cython version

Open bdraco opened this issue 10 months ago • 4 comments

main 192,792,792ns this PR 168,216,041ns

                            Benchmark Results                             
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃             Benchmark ┃   Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 192,792,792ns │        0.9% │    2.94s │    15 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘
                            Benchmark Results                             
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃             Benchmark ┃   Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 168,216,041ns │        1.3% │    2.93s │    17 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘

Benchmark https://github.com/bdraco/pyyaml/blob/bench/tests_bench/benchmarks/test_simple_load.py (execute with pytest --codspeed tests_bench/benchmarks/test_simple_load.py)

Cython analysis via cythonize -X language_level=3 -a -i yaml/_yaml.pyx

Before Screenshot 2025-02-21 at 12 07 34 PM

After Screenshot 2025-02-21 at 12 07 56 PM

bdraco avatar Feb 21 '25 18:02 bdraco

Thanks, this looks like some low-hanging fruit for a nice speed boost! I don't love changing all the Cython callsites to use the two-call pattern, though... I might try playing with a cdef factory method that could accomplish the same thing with a single callsite (or feel free to do so yourself if you're interested!)- IIRC we've got a lot of flexibility there since we're not worried about non-Cython subclasses of this Mark.

nitzmahone avatar May 16 '25 17:05 nitzmahone

I didn't make a helper because Cython would end up ref counting the Mark object being passed around in the helper.. It might not make much difference though. Will test

bdraco avatar May 16 '25 18:05 bdraco

Still a nice speed up even with the helper. Note that 3.1 got a bit faster already since I opened this PR

With Cython 3.1 main (new baseline)

                           Benchmark Results                             
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃             Benchmark ┃   Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 177,877,584ns │        0.7% │    2.88s │    16 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘

With Cython 3.1 64e40d89ddb54b192d84132e6540fcc6553a8050 (original commit)

                          Benchmark Results                             
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃             Benchmark ┃   Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 159,839,750ns │        1.4% │    2.94s │    18 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘

With Cython 3.1 (added _create_mark) 964c641aad18b5df3ef92b9202a7b82283939d9e (original commit)

                            Benchmark Results                             
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃             Benchmark ┃   Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 162,833,125ns │        0.8% │    2.97s │    18 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘

bdraco avatar May 16 '25 18:05 bdraco

Pushed another one that a bit different. Single call point but it does have a bit more maintenance burden. 688ebe766c4c9e4cb4e3d79aa9cfc5fbb9c0f097

                            Benchmark Results                             
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃             Benchmark ┃   Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 155,708,875ns │        0.7% │    2.84s │    18 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘

bdraco avatar May 16 '25 18:05 bdraco