QuantLib Performance regression going from 1.31 to 1.34

We have been running on 1.31 for a while. I am attempting to upgrade to 1.34. Once the necessary changes were made, i found that my application produced identical results, but was considerably slower - measured at a very coarse granularity (roughly "build a particular curve and evaluate a lot of metrics"), things took 2x - 3x longer.

I have picked one fairly simple subsystem and extracted it into a standalone program which just needs QuantLib to build. The program runs in a loop building an ESTR curve from OIS quotes, and then pricing swaps, timing how long it takes to price the swaps. It does twenty warmup iterations, then twenty measurement iterations, and prints the time taken in milliseconds for each of the latter (along with the minimum and maximum calculated swap rates, as a sanity check). This shows a roughly 4x - 5x slowdown in pricing swaps.

With 1.31 it prints:

iteration,minSwapRate,maxSwapRate,elapsed
1,0.015481,0.039528,10.503099
2,0.015481,0.039528,10.589348
3,0.015481,0.039528,10.340485
4,0.015481,0.039528,10.325391
5,0.015481,0.039528,10.381338
6,0.015481,0.039528,10.418790
7,0.015481,0.039528,10.305750
8,0.015481,0.039528,10.256093
9,0.015481,0.039528,10.858235
10,0.015481,0.039528,11.441151
11,0.015481,0.039528,11.270055
12,0.015481,0.039528,10.663258
13,0.015481,0.039528,10.538956
14,0.015481,0.039528,10.339440
15,0.015481,0.039528,10.377952
16,0.015481,0.039528,10.255195
17,0.015481,0.039528,10.317267
18,0.015481,0.039528,10.683488
19,0.015481,0.039528,10.702674
20,0.015481,0.039528,10.575539

With 1.34 it prints:

iteration,minSwapRate,maxSwapRate,elapsed
1,0.015481,0.039528,45.819755
2,0.015481,0.039528,47.309368
3,0.015481,0.039528,48.141875
4,0.015481,0.039528,47.666822
5,0.015481,0.039528,47.050929
6,0.015481,0.039528,47.007651
7,0.015481,0.039528,47.852757
8,0.015481,0.039528,47.241947
9,0.015481,0.039528,47.784260
10,0.015481,0.039528,47.959932
11,0.015481,0.039528,48.477983
12,0.015481,0.039528,48.494646
13,0.015481,0.039528,48.306312
14,0.015481,0.039528,47.953128
15,0.015481,0.039528,48.861151
16,0.015481,0.039528,48.116760
17,0.015481,0.039528,48.483408
18,0.015481,0.039528,47.896765
19,0.015481,0.039528,48.180332
20,0.015481,0.039528,47.863267

See DiscountingCurveDemo.cpp.txt for the code.

My versions of QuantLib carry some small patches affecting the calculation of swap BPS, but that should not be relevant here.

I compiled QuantLib with GCC 7.2.0, and Boost 1.66.0. My build script contains:

export CXXFLAGS="-O2 -ggdb -Wall -Wno-unknown-pragmas -Werror -std=c++14 -fno-math-errno -fno-trapping-math -DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS"

./configure --with-sysroot=${sysroot_dir} --enable-std-classes --enable-indexed-coupons --enable-error-lines

I compiled the app with GCC 13.1.0. My CMakeLists.txt includes:

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_BUILD_TYPE RelWithDebInfo)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -fno-math-errno -fno-trapping-math -ggdb")

The reason i'm compiling QuantLib with such an old GCC is that i've seen significant performance regressions using newer ones.

I'm on Ubuntu 22.04.4.

Do you have any thoughts on this? Can you reproduce this difference? If not, do you see any obvious differences between your setup and mine? I am happy to spend time changing things around at my end - but so far, the difference has been unavoidable, so i would like to have some high-Sharpe-ratio ideas on what to change!

Apr 30 '24 19:04 tomwhoiscontrary

1.32 introduced lazy cashflows, it might be related to that change - I'll try to reproduce using your test code

When you say newer compiler versions cause performance issues, do you mean longer build times or degraded performance during runtime?

May 01 '24 13:05 pcaspers

@pcaspers I mean degraded performance during runtime. It's been a while since i tried though. I am trying to set up some infrastructure to explore this in a reproducible and shareable way - hopefully i can update you on that at some point.

May 01 '24 16:05 tomwhoiscontrary

Ok interesting. Keep us updated on that topic.

May 01 '24 18:05 pcaspers

@tomwhoiscontrary The usual high Sharpe ratio way to find the source of slow code is to use a profiler like VTune Profiler. Maybe you can try that and let us know what you find?

May 01 '24 21:05 sweemer

I couldn't reproduce it on my Mac—if anything, 1.34 was slightly faster. Compiled with the configure and cxx flags you reported, but of course it's clang, not gcc, and I have the latest boost installed.

May 03 '24 15:05 lballabio

The same goes for an Ubuntu 22.04.4 machine, default gcc 11.4. No difference.

May 03 '24 15:05 lballabio

Interesting. That's encouraging, because it means there might be a problem with my build environment, but frustrating, because it means there might be a problem with my build environment.

I'm trying to set up a simple self-contained build in a docker container, where i can vary the compiler, Boost version, and QuantLib version. This is proving a surprisingly rocky road so far, though. Will keep you posted.

May 09 '24 12:05 tomwhoiscontrary

Thanks!

May 09 '24 13:05 lballabio

I have written a script to run this demo in a docker container with a defined version of GCC and QuantLib: https://github.com/tomwhoiscontrary/QuantLibDemo

It gets Boost from the distro package manager, so that depends on the version of Debian used by the GCC image. It doesn't seem to make much difference, though. I would like to get that under vcpkg at some point.

The results from this are interesting - all combinations of GCC and QuantLib give a result of about 21 - 26 ms per iteration. It mostly gets faster with later GCC versions, and mostly stays the same across QuantLib versions. There are deviations from that pattern which might be meaningful and might be noise, but are fairly minor.

So the good (?) news is that this does not reproduce my core worry, that QL has got significantly slower. On the strength of that, i'm happy to close this bug. I'll keep working on this, and let you know if i find anything.

One thing that's different between this and my real codebase is that here, the same compiler is used for QuantLib and the demo code, whereas for in the real code, we build QuantLib with GCC 7 and the Demo with GCC 13. I'd be surprised if that combination was faster in this setting.

Something that's quite odd here is that the performance is >20 ms, whereas in my local build outside a container, using QuantLib 1.31, it's 10 ms.

May 15 '24 17:05 tomwhoiscontrary

Well, I would say "good to hear" if it wasn't for your problem...

Thanks for the analysis, and do keep us informed!

May 16 '24 15:05 lballabio

I resolved this. There was no problem with QuantLib at all. There was an incidental change to our build scripts around the same time as QuantLib 1.33 came out which disabled optimisations!

Jun 12 '24 16:06 tomwhoiscontrary

Ok, this time I can say "good to hear" :)

Jun 12 '24 20:06 lballabio