Speed up creating Mark objects in the Cython version
main 192,792,792ns this PR 168,216,041ns
Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 192,792,792ns │ 0.9% │ 2.94s │ 15 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘
Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 168,216,041ns │ 1.3% │ 2.93s │ 17 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘
Benchmark https://github.com/bdraco/pyyaml/blob/bench/tests_bench/benchmarks/test_simple_load.py (execute with pytest --codspeed tests_bench/benchmarks/test_simple_load.py)
Cython analysis via cythonize -X language_level=3 -a -i yaml/_yaml.pyx
Before
After
Thanks, this looks like some low-hanging fruit for a nice speed boost! I don't love changing all the Cython callsites to use the two-call pattern, though... I might try playing with a cdef factory method that could accomplish the same thing with a single callsite (or feel free to do so yourself if you're interested!)- IIRC we've got a lot of flexibility there since we're not worried about non-Cython subclasses of this Mark.
I didn't make a helper because Cython would end up ref counting the Mark object being passed around in the helper.. It might not make much difference though. Will test
Still a nice speed up even with the helper. Note that 3.1 got a bit faster already since I opened this PR
With Cython 3.1 main (new baseline)
Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 177,877,584ns │ 0.7% │ 2.88s │ 16 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘
With Cython 3.1 64e40d89ddb54b192d84132e6540fcc6553a8050 (original commit)
Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 159,839,750ns │ 1.4% │ 2.94s │ 18 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘
With Cython 3.1 (added _create_mark) 964c641aad18b5df3ef92b9202a7b82283939d9e (original commit)
Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 162,833,125ns │ 0.8% │ 2.97s │ 18 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘
Pushed another one that a bit different. Single call point but it does have a bit more maintenance burden. 688ebe766c4c9e4cb4e3d79aa9cfc5fbb9c0f097
Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_large_parse_yaml │ 155,708,875ns │ 0.7% │ 2.84s │ 18 │
└───────────────────────┴───────────────┴─────────────┴──────────┴───────┘