loda-cpp icon indicating copy to clipboard operation
loda-cpp copied to clipboard

Build with PGO

Open helpau opened this issue 2 months ago • 0 comments

Building with PGO speeds up loda benchmark by 0%-35%. However, this is quite difficult due to at least 3 profiles (cl, gcc, clang, maybe more because of x86_64/arm64?) and the dependency on the toolchain version. Example of changes here #547 Results from Apple M3, clang 17 Without PGO

Sequence Terms Reg Eval Inc Eval Vir Eval
A000040 1000 3.48s - 0.43s
A000394 1000 1.72s - 0.22s
A000401 1000 2.98s - 0.22s
A000796 300 0.70s - -
A001041 300 0.73s - -
A001113 300 0.63s - -
A002110 300 0.77s - -
A002760 200 24.71s - 1.18s
A057552 300 2.81s 0.03s -
A079309 300 2.80s 0.03s -
A002193 400 0.56s 0.23s -
A035856 500 1.62s - -
A001609 1000 0.52s 0.00s -
A003411 1000 0.59s 0.00s -
A012866 1000 1.00s 0.00s -
A000045 2000 1.82s 0.00s -
A001304 3000 0.98s 0.00s -
A000005 5000 1.04s - -
A130487 5000 1.70s 0.00s -
A000030 500000 0.38s - -

With PGO(instrumented profile, profile generated from loda mine -H 1)

Sequence Terms Reg Eval Inc Eval Vir Eval
A000040 1000 2.22s - 0.32s
A000394 1000 1.41s - 0.17s
A000401 1000 2.34s - 0.17s
A000796 300 0.67s - -
A001041 300 0.72s - -
A001113 300 0.60s - -
A002110 300 0.73s - -
A002760 200 19.51s - 0.98s
A057552 300 2.79s 0.03s -
A079309 300 2.77s 0.03s -
A002193 400 0.56s 0.21s -
A035856 500 1.55s - -
A001609 1000 0.50s 0.00s -
A003411 1000 0.58s 0.00s -
A012866 1000 0.99s 0.00s -
A000045 2000 1.82s 0.00s -
A001304 3000 0.84s 0.00s -
A000005 5000 0.74s - -
A130487 5000 1.12s 0.00s -
A000030 500000 0.28s - -

helpau avatar Oct 20 '25 06:10 helpau