yices2
yices2 copied to clipboard
Use a custom allocator for potentially massive performance improvement
Background
I noticed that, for one of my example BV
exists-forall problems, time
reported that yices spent a full third of its wall-clock time in the kernel. When I ran strace -T
I saw that after an initial phase with a normal assortment of syscalls, during the solving phase, system calls were exclusively brk()
, presumably called by libc malloc()
, and in turn by the safe_malloc()
s I see sprinkled throughout the code base.
Experiments
I decided to try an alternative malloc()
implementation, because a third of all time in the kernel allocating memory is rather excessive.
My experiments all ran on a Debian 10 VM, which should use the GNU malloc()
implementation in libc by default. Here's the time info with the default GNU implementation:
user@host:~/github/yosys-examples/example1$ #GNU malloc: user@host:~/github/yosys-examples/example1$ time $(../../yices2/build/x86_64-pc-linux-gnu-release/dist/bin/yices-smt2 example.smt2 &>/dev/null)
real 6m17.133s user 4m5.808s sys 2m8.390s
With the default allocator, Yices used ~365MB-405MB, with spikes over 400MB.
When I tried linking with jemalloc, which is the one used by FreeBSD libc, I got the following time info:
user@host:~/github/yosys-examples/example1$ #JEMalloc: user@host:~/github/yosys-examples/example1$ time $(../../yices2/build/x86_64-pc-linux-gnu-release/dist/bin/yices-smt2 example.smt2 &>/dev/null)
real 5m7.416s user 3m20.906s sys 1m44.346s
In this case, memory usage ranged generally between 325-395MB, with spikes up to 450MB.
And when I tried linking with tcmalloc, included in gperftools, I got the following time info:
user@host:~/github/yosys-examples/example1$ #TCMalloc: user@host:~/github/yosys-examples/example1$ time $(../../yices2/build/x86_64-pc-linux-gnu-release/dist/bin/yices-smt2 example.smt2 &>/dev/null)
real 3m11.852s user 3m8.928s sys 0m0.549s
In this case, Yices used ~365MB of memory with very little change and only slight growth. So it uses less memory at peak and is almost 100% faster.
Next steps
I know different platforms have different malloc()
implementations, and some are more efficient for this application than others. I also know that this is only one example exercising only one part of the solver that apparently does a lot of context creation and destruction. But I see safe_malloc()
sprinkled liberally throughout the code base, and I strongly suspect a serious benchmark suite will show significant performance improvement across the board.
There are also potential licensing issues.
But I think at least for some platforms (especially those using GNU libc) the default malloc()
implementation should be changed if at all possible. This is not a measly 3% speedup, it's a very juicy fruit dangling low enough to practically just reach up and pick.
Thanks. We have experimented with tcmalloc in the past. I didn't know the difference was that large.
Indeed, I was very surprised to see such a large difference. If I still had access to a beefy machine I would love to see what it does for some of my larger problems that have previously taken tens of GB of RAM and 24+h on a Xeon E5-2698v4. If the performance improvement persists with bigger problem sizes and more difficult problems, it's a really huge boost in scalability for model-driven program synthesis techniques in hardware design.