symengine
symengine copied to clipboard
Benchmarks against Symbolics.jl
I want to compare the speed of SymEngine and Symbolics.jl. I don't know if I am using Symbolics.jl correctly, so I'll document what I do here. I am using Julia 1.8 on Apple M1 Max.
julia> using BenchmarkTools, Symbolics
julia> vars = @variables a,b,c,d,e,f,g,h,i
julia> x = ((a+b+c+1)^20)
(1 + a + b + c)^20
julia> @benchmark y = expand(x)
BenchmarkTools.Trial: 21 samples with 1 evaluation.
Range (min … max): 238.961 ms … 244.956 ms ┊ GC (min … max): 2.01% … 3.99%
Time (median): 240.229 ms ┊ GC (median): 2.24%
Time (mean ± σ): 241.298 ms ± 2.024 ms ┊ GC (mean ± σ): 2.72% ± 0.84%
▃ █▃▃
█▁▁▇▁▁▁▁███▁▇▁▁▁▁▁▁▇▁▇▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▇▁▇▁▁▁▁▇▇▁▁▁▁▁▁▇▇ ▁
239 ms Histogram: frequency by time 245 ms <
Memory estimate: 113.84 MiB, allocs estimate: 447939.
I then compiled SymEngine (default cmake), and applied the following diff:
diff --git a/benchmarks/expand1.cpp b/benchmarks/expand1.cpp
index 4471152e..0c6a897f 100644
--- a/benchmarks/expand1.cpp
+++ b/benchmarks/expand1.cpp
@@ -30,11 +30,12 @@ int main(int argc, char *argv[])
RCP<const Basic> y = symbol("y");
RCP<const Basic> z = symbol("z");
RCP<const Basic> w = symbol("w");
- RCP<const Basic> i60 = integer(60);
+ RCP<const Basic> i1 = integer(1);
+ RCP<const Basic> i60 = integer(20);
RCP<const Basic> e, r;
- e = pow(add(add(add(x, y), z), w), i60);
+ e = pow(add(add(add(x, y), z), i1), i60);
std::cout << "Expanding: " << *e << std::endl;
And run it:
$ ./expand1
Expanding: (1 + x + y + z)**20
3ms
number of terms: 1770
So I am getting 3ms. I don't know if I am running both benchmarks correctly. It seems SymEngine is 80x faster, which seems that I probably do something wrong.
I asked the Julia community for help here: https://discourse.julialang.org/t/benchmarking-symbolics-jl-against-symengine-jl/103744.
I also tried:
$ ./expand1
Expanding: (1 + x + y + z)**100
204ms
number of terms: 176850
But had to kill the Julia version after about a minute.
This is interesting! Are you thinking of comparing to SymEngine.jl or plain C++ symengine? Although outside of your scope here it would also be interesting to benchmark the various symengine wrappers to see how much overhead the various language wrappers add.
Good idea, I was comparing against C++ because I know how to compile it correctly. I am using gmp, but I think Flint would be even faster. I just used the SymEngine.jl and got:
julia> using BenchmarkTools, SymEngine
julia> vars = @vars a,b,c,d,e,f,g,h,i
(a, b, c, d, e, f, g, h, i)
julia> x = ((a+b+c+1)^20)
(1 + a + b + c)^20
julia> @benchmark y = expand(x)
BenchmarkTools.Trial: 2595 samples with 1 evaluation.
Range (min … max): 1.477 ms … 97.622 ms ┊ GC (min … max): 0.00% … 0.11%
Time (median): 1.561 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.926 ms ± 5.876 ms ┊ GC (mean ± σ): 0.02% ± 0.01%
▂ ▂▃▂▄▆▃▅▄▃▇▅▄▄▆▅▅▆▇▄▃▅▆▄▇█▅▅▄▅▃▇▃▅▄▁▁
▁▁▃▃▅▆▄▆▅▆▇██████████████████████████████████████▇█▆▅▅▄▃▂▂ ▆
1.48 ms Histogram: frequency by time 1.64 ms <
Memory estimate: 186.95 KiB, allocs estimate: 17407.
So I am getting 1.5ms compared to C++'s 3ms, but it's the same order of magnitude, so probably it's correct. I then tried:
julia> x = ((a+b+c+1)^100)
(1 + a + b + c)^100
julia> @benchmark y = expand(x)
BenchmarkTools.Trial: 20 samples with 1 evaluation.
Range (min … max): 208.862 ms … 347.852 ms ┊ GC (min … max): 0.00% … 9.87%
Time (median): 262.638 ms ┊ GC (median): 0.05%
Time (mean ± σ): 257.937 ms ± 39.495 ms ┊ GC (mean ± σ): 1.37% ± 3.09%
█ ▁▁
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇██▄▄▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
209 ms Histogram: frequency by time 348 ms <
Memory estimate: 34.97 MiB, allocs estimate: 1899009.
Which is consistent with C++ as well (the fastest is 209ms, compared to 204ms in C++).
So probably my numbers above are correct.
Here is another benchmark, which just benchmarks multiplying out two large expressions.
SymEngine:
julia> using BenchmarkTools, SymEngine
julia> vars = @vars a,b,c,d,e,f,g,h,i
(a, b, c, d, e, f, g, h, i)
julia> x = expand(((a+b+c+1)^20));
julia> y = expand(((a+b+c+1)^15));
julia> @benchmark z = expand(x*y)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
Range (min … max): 2.294 s … 5.234 s ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.764 s ┊ GC (median): 0.00%
Time (mean ± σ): 3.764 s ± 2.079 s ┊ GC (mean ± σ): 0.00% ± 0.00%
█ █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
2.29 s Histogram: frequency by time 5.23 s <
Memory estimate: 91.75 MiB, allocs estimate: 6013109.
vs Julia:
julia> using BenchmarkTools, Symbolics
julia> vars = @variables a,b,c,d,e,f,g,h,i
9-element Vector{Num}:
a
b
c
d
e
f
g
h
i
julia> x = expand(((a+b+c+1)^20));
julia> y = expand(((a+b+c+1)^15));
julia> @benchmark z = expand(x*y)
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 7.566 s (1.70% GC) to evaluate,
with a memory estimate of 2.02 GiB, over 4750560 allocations.
So Julia is now only "7.566 / 2.294 = 3.3" times slower, which is more reasonable.
There is also an issue with running global scope things in Julia, that there might be some unnecessary boxing/unboxing. So I tried this for Julia:
julia> using BenchmarkTools, Symbolics
julia> function f()
vars = @variables a,b,c,d,e,f,g,h,i
x = expand(((a+b+c+1)^20));
y = expand(((a+b+c+1)^15));
z = expand(x*y)
end
f (generic function with 1 method)
julia> @benchmark f()
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 7.945 s (1.45% GC) to evaluate,
with a memory estimate of 2.20 GiB, over 6091435 allocations.
And SymEngine:
julia> using BenchmarkTools, SymEngine
julia> function f()
vars = @vars a,b,c,d,e,f,g,h,i
x = expand(((a+b+c+1)^20));
y = expand(((a+b+c+1)^15));
z = expand(x*y)
end
f (generic function with 1 method)
julia> @benchmark f()
BenchmarkTools.Trial: 2 samples with 1 evaluation.
Range (min … max): 2.902 s … 3.016 s ┊ GC (min … max): 0.01% … 0.00%
Time (median): 2.959 s ┊ GC (median): 0.00%
Time (mean ± σ): 2.959 s ± 80.544 ms ┊ GC (mean ± σ): 0.00% ± 0.00%
█ █
█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
2.9 s Histogram: frequency by time 3.02 s <
Memory estimate: 92.02 MiB, allocs estimate: 6038323.
I also tried Julia 1.9.3:
@benchmark f()
BenchmarkTools.Trial: 1 sample with 1 evaluation.
Single result which took 8.149 s (2.02% GC) to evaluate,
with a memory estimate of 2.24 GiB, over 6822738 allocations.
Does SymEngine use a polynomial data structure to expand and convert it back to an unstructured expression tree or does it exclusively work on the unstructured expression tree?
Here is Nemo's performance:
julia> using Nemo, BenchmarkTools
julia> R, (a, b, c) = PolynomialRing(ZZ, [:a, :b, :c]);
julia> p = a+b+c+1;
julia> @btime $p^20;
258.569 μs (25 allocations: 61.62 KiB)
Does SymEngine use a polynomial data structure to expand and convert it back to an unstructured expression tree or does it exclusively work on the unstructured expression tree?
It looks at the expression tree and expands the power of addition expressions faster.
For eg: instead of doing (a + b)^2 = a^2 + a*b + b*a + b^2 it does (a + b)^2 = a^2 + 2*a*b + b^2 and similar for higher powers.
However, it doesn't do the same optimizations that Flint/Nemo does.
For eg SymEngine does
(a^2 + a + 1)^2 = a^4 + a^2 + 1 + 2a^3 + 2a^2 + 2a
and simplifies the expression to
(a^2 + a + 1)^2 = a^4 + 3 a^2 + 1 + 2a^3 + 2a
Does it use the multinomial theorem to expand power of sums?
Yes
Ah, so that explains it. We don't have that specialization at all. For some polynomial operations, we do a round trip to multivariate polynomial representations and back. @shashi is more familiar with this process.
For some polynomial operations, we do a round trip to multivariate polynomial representations and back.
SymEngine does not have that at all.
@YingboMa I recommend to focus on this benchmark: https://github.com/symengine/symengine/issues/1973#issuecomment-1713906673 which makes it irrelevant how the power expansion is done. It benchmarks how quickly you can expand two arbitrary, long, expressions.
@certik, that benchmark has two expands with a power. So, not exactly comparable.
@isuruf I don't follow. The benchmark does z = expand(x*y), where both "x" and "y" is already expanded (no matter how). And it benchmarks multiplying those two and expanding it. It does not expand with a power.
@certik, the benchmark time includes computing x and y which are powers.
@isuruf I don't think it does. The following lines:
julia> x = expand(((a+b+c+1)^20));
julia> y = expand(((a+b+c+1)^15));
julia> @benchmark z = expand(x*y)
First expand the powers, so x and y contain expanded powers. I checked it by printing "x" and "y" and they are indeed expanded.
Ah, you are running the benchmark like that. Thanks.
Here is a differentiation benchmark. Symbolics.jl:
julia> using BenchmarkTools, Symbolics
julia> vars = @variables a,b,c,d,e,f,g,h,i,x
julia> function ff()
expr = sin(sin(sin(sin(x))));
for i = 1:10
expr = Symbolics.derivative(expr, x)
end
end
ff (generic function with 1 method)
julia> @benchmark ff()
BenchmarkTools.Trial: 17 samples with 1 evaluation.
Range (min … max): 289.565 ms … 299.344 ms ┊ GC (min … max): 6.15% … 9.44%
Time (median): 295.178 ms ┊ GC (median): 8.53%
Time (mean ± σ): 294.804 ms ± 2.812 ms ┊ GC (mean ± σ): 8.28% ± 0.82%
▁ ▁ ▁ ▁ ▁ ▁ ▁ ▁ █ ▁ ▁ ▁▁ ▁ ▁ ▁
█▁▁▁▁▁▁█▁▁▁▁▁▁█▁█▁▁█▁▁█▁▁█▁▁▁▁▁▁▁█▁█▁█▁▁▁▁▁▁█▁██▁▁▁█▁▁▁▁▁█▁▁█ ▁
290 ms Histogram: frequency by time 299 ms <
Memory estimate: 278.14 MiB, allocs estimate: 3843662.
SymEngine.jl
julia> using BenchmarkTools, SymEngine
julia> vars = @vars a,b,c,d,e,f,g,h,i,x
julia> expr = sin(sin(sin(sin(x))));
julia> @benchmark diff(expr, x, 10)
BenchmarkTools.Trial: 557 samples with 1 evaluation.
Range (min … max): 6.661 ms … 62.706 ms ┊ GC (min … max): 0.00% … 0.33%
Time (median): 8.474 ms ┊ GC (median): 0.00%
Time (mean ± σ): 8.982 ms ± 5.555 ms ┊ GC (mean ± σ): 0.02% ± 0.03%
▁ ▄▃ ▁▃▁▁▄ ▃▁▂ ▁▁▁▂▂▁ █▄ ▃ ▃▅▃▃▅▁
█▄▃▃▄▃▃▇███████████████████▇███████████▇▆▃▃▃▃▃▃▃▁▁▃▁▃▁▁▁▁▃ ▅
6.66 ms Histogram: frequency by time 11 ms <
Memory estimate: 511.17 KiB, allocs estimate: 33272.
julia> function ff()
expr = sin(sin(sin(sin(x))));
for i = 1:10
expr = diff(expr, x)
end
end
ff (generic function with 1 method)
julia> @benchmark ff()
BenchmarkTools.Trial: 567 samples with 1 evaluation.
Range (min … max): 6.522 ms … 62.355 ms ┊ GC (min … max): 0.00% … 0.22%
Time (median): 8.326 ms ┊ GC (median): 0.00%
Time (mean ± σ): 8.820 ms ± 5.487 ms ┊ GC (mean ± σ): 0.02% ± 0.02%
▁▆▅▄▃▃▁ ▂▁▄▁ ▃▆ ▁▄▃ ▆▁▁▇█▁▇ ▁▇ ▁ ▅▅▅▆
▇▆▄▃▃▄▃▃▃▅███████▇████▇██████▇███████▇█████▇████▅▇▅▃▁▃▅▃▃▃ ▅
6.52 ms Histogram: frequency by time 10 ms <
Memory estimate: 511.55 KiB, allocs estimate: 33292.
On this particular benchmark, SymEngine seems 35x faster. I am probably doing something wrong. If anyone can figure it out, that would be great. Once we have representative benchmarks that the Julia community agrees that they are fair, we'll add it to our benchmarks.