Performance of universal polynomials / MPolys exponent_vectors
The method exponent_vectors is called at several places:
- comparison of polynomials
- hashing of polynomials
- collecting variables occurring in a polynomial
- others?
Each time it allocates a new array of dynamic size. This adds quickly up in runtime when code uses a lot of simple comparision. Especially for UnivPoly which can have a lot of variables.
Is it possible to improve the performance somehow?
Some ideas that all have disadvantages:
- Use
Memoryinstead ofArray. This is faster when allocating unitialized memory. Needs Julia 1.11 - Use
StaticArrays. This is only possible if there is a statically known reasonable maximum number of elements an exponent vector can have. - Cache results of
exponent_vectors. There needs to be a robust mechanism which invalidates the cache when the exponents change. Improves performance only if there are actual cache hits. - Reduce calls to
exponent_vectors. E.g. comparing whether two generators are the same might be easier than comparing if two generic polynomials are the same. But not sure if there are improvements possible.
To clarify: In my use case I only want to hash, compare or list generators. So if there are improvements possible for those special cases of polynomials it would be helpful for me.
Another possible performance improvement would be to only "cache" the memory allocated for the exponent vector in the struct of the polynomial. This way the cache only needs to be invalidated when size of the exponent vector changes. This however is harder to be made threadsafe.
Can you specifiy which polynomials we are talking about? Equality of multivariate polynomials in Nemo usually calls directly into flint and there is no exponent_vectors call.
UnivPoly, although that one is part of AbstractAlgebra, it does use the MPoly from this library.
Can you please post a full example to reproduce the slow equality test for MPolys?
Here is one for the equality test of generators. It compares the speed for generators x from a UnivPoly to the speed for generators y from an MPoly. The benchmark for comparing two symbols serves as the lower bound on how long it could take.
julia> using Nemo
Welcome to Nemo version 0.49.0
Nemo comes with absolutely no warranty whatsoever
julia> using BenchmarkTools
julia> R10, ys10 = polynomial_ring(QQ, [Symbol("y$i") for i in 1:10])
(Multivariate polynomial ring in 10 variables over QQ, QQMPolyRingElem[y1, y2, y3, y4, y5, y6, y7, y8, y9, y10])
julia> R20, ys20 = polynomial_ring(QQ, [Symbol("y$i") for i in 1:20])
(Multivariate polynomial ring in 20 variables over QQ, QQMPolyRingElem[y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12, y13, y14, y15, y16, y17, y18, y19, y20])
julia> Ru = universal_polynomial_ring(QQ)
Universal Polynomial Ring over Rational field
julia> xs10 = [gen(Ru, Symbol("x$i")) for i in 1:10]
10-element Vector{AbstractAlgebra.Generic.UnivPoly{QQFieldElem}}:
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
julia> x, y = first.([xs10, ys10])
2-element Vector{RingElem}:
x1
y1
julia> @benchmark :x == :x
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 1.951 ns … 33.042 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.040 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.125 ns ± 0.627 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃ █ ▃
██▂▁▁▁▁▁▁▁▁▁▅█▅▂▁▁▂▁▁▁▂▂▂▂▆▅▂▁▂▁▁▁▁▁▁▁▁▁▆█▃▂▁▂▁▁▁▁▁▂▂▂▁▂▅▆ ▃
1.95 ns Histogram: frequency by time 2.32 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark y == y
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
Range (min … max): 33.834 ns … 568.265 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 35.327 ns ┊ GC (median): 0.00%
Time (mean ± σ): 38.154 ns ± 13.567 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅▇█▄▃▁▃▂▂▁▂▁▁▁ ▁ ▁ ▁ ▂
█████████████████████▇▇▇▆▇██▇████▇▆▆▆▆▇▇▇▆▆▆▄▁▅▄▄▄▅▄▅▄▄▁▄▁▄▄ █
33.8 ns Histogram: log(frequency) by time 76.1 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark x == x
BenchmarkTools.Trial: 10000 samples with 482 evaluations per sample.
Range (min … max): 218.834 ns … 933.672 μs ┊ GC (min … max): 0.00% … 65.59%
Time (median): 225.392 ns ┊ GC (median): 0.00%
Time (mean ± σ): 449.474 ns ± 11.744 μs ┊ GC (mean ± σ): 27.40% ± 1.12%
█▅▄▄▃▂▂▁ ▁ ▁▂▂▁▁ ▁
████████████▆▇▆▆███████▆▇▅▄▅▆▆▅▆▇▇▆▆█▇▇▆▇▆▆▆▆▅▅▅▅▅▅▃▄▅▃▃▃▄▅▄▃ █
219 ns Histogram: log(frequency) by time 642 ns <
Memory estimate: 192 bytes, allocs estimate: 6.
julia> xs20 = [gen(Ru, Symbol("x$i")) for i in 1:20]
20-element Vector{AbstractAlgebra.Generic.UnivPoly{QQFieldElem}}:
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
julia> x, y = first.([xs20, ys20])
2-element Vector{RingElem}:
x1
y1
julia> @benchmark y == y
BenchmarkTools.Trial: 10000 samples with 989 evaluations per sample.
Range (min … max): 33.832 ns … 590.576 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 35.686 ns ┊ GC (median): 0.00%
Time (mean ± σ): 39.926 ns ± 17.521 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█▆▅▃▃▃▂▂▁▁▂▁▁ ▁ ▁ ▁▁ ▁
█████████████████▇██▇███████▇▇█▆▇▆▇▆▆▆▆▆▅▇▆▆▅▅▄▅▅▄▄▅▄▅▄▅▄▃▄▅ █
33.8 ns Histogram: log(frequency) by time 85.1 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark x == x
BenchmarkTools.Trial: 10000 samples with 337 evaluations per sample.
Range (min … max): 248.947 ns … 264.536 μs ┊ GC (min … max): 0.00% … 59.56%
Time (median): 274.015 ns ┊ GC (median): 0.00%
Time (mean ± σ): 552.424 ns ± 6.393 μs ┊ GC (mean ± σ): 26.31% ± 2.35%
▇██▇▅▅▄▂▂▁ ▁ ▁▁▃▃▃▂▂▁▁▂▂▂▃▃▂▁▁ ▂
████████████▇██████████████████▇▇▇▇▇▇▆▆▆▆▅▆▃▅▅▅▅▆▅▅▅▄▄▅▅▅▃▅▅▄ █
249 ns Histogram: log(frequency) by time 809 ns <
Memory estimate: 352 bytes, allocs estimate: 6.
Comparing two generators x from a UnivPoly is more than 5 times slower than comparing two generators y from a normal MPoly. ~~And it might be that the time needed for UnivPoly grows with the number of generators.~~
Comparing two generators from an MPoly is 10 times slower than comparing two symbols. While that might not be the fairest comparison it gives a lower bound on how fast that kind of operation could be.
It does not get slower for Polynomial Rings with more generators:
julia> R100, ys100 = polynomial_ring(QQ, [Symbol("y$i") for i in 1:100])
(Multivariate polynomial ring in 100 variables over QQ, QQMPolyRingElem[y1, y2, y3, y4, y5, y6, y7, y8, y9, y10 … y91, y92, y93, y94, y95, y96, y97, y98, y99, y100])
julia> xs100 = [gen(Ru, Symbol("x$i")) for i in 1:100]
100-element Vector{AbstractAlgebra.Generic.UnivPoly{QQFieldElem}}:
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
⋮
x82
x83
x84
x85
x86
x87
x88
x89
x90
x91
x92
x93
x94
x95
x96
x97
x98
x99
x100
julia> x, y = first.([xs100, ys100])
2-element Vector{RingElem}:
x1
y1
julia> @benchmark y == y
BenchmarkTools.Trial: 10000 samples with 988 evaluations per sample.
Range (min … max): 33.852 ns … 210.818 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 35.328 ns ┊ GC (median): 0.00%
Time (mean ± σ): 37.085 ns ± 6.570 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▆▇█▄▃▂▁▁▂▂▁▁ ▁
█████████████▇▇▇█▇▇██▇▆▇▇▆▆▇▇█▆▄▆▅▇▆▄▅▆▆▇▆▄▆▆▇▆█▇▆▄▂▄▄▄▄▅▅▅▅ █
33.9 ns Histogram: log(frequency) by time 62.1 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark x == x
BenchmarkTools.Trial: 10000 samples with 244 evaluations per sample.
Range (min … max): 277.463 ns … 492.389 μs ┊ GC (min … max): 0.00% … 72.41%
Time (median): 524.262 ns ┊ GC (median): 0.00%
Time (mean ± σ): 789.157 ns ± 10.656 μs ┊ GC (mean ± σ): 26.88% ± 2.02%
█▃ ▆▁ ▂
▄███▄▃▂▂▂▁▂▂▂▁▁▁▂█████▄▄▄▅▄▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
277 ns Histogram: frequency by time 1.1 μs <
Memory estimate: 512 bytes, allocs estimate: 6.
Thanks for the clarification. The universal polynomial comparison is just missing a
if parent(data(a)) === parent(data(b))
return data(a) == data(b)
end
right at the top. That should make it as fast as the MPoly version in your examples.
(One could also think about calling univ_promote before, but that would be a bit more intrusive.)
But there are cases when two generators from the same UnivPoly can be the same, while having different parent, right? At least that's what I already observed I think.
Here is the case where the generators are the same, but ~~it can't compare their data against each other~~ the parents of their data differ:
julia> using Nemo
Welcome to Nemo version 0.49.0
Nemo comes with absolutely no warranty whatsoever
julia> R = universal_polynomial_ring(QQ)
Universal Polynomial Ring over Rational field
julia> x = gen(R, :x)
x
julia> x2 = (x-1)+1
x
julia> data(x2) == data(x)
true
julia> gen(R, :y)
y
julia> x = gen(R, :x)
x
julia> data(x2) == data(x)
ERROR: parents do not match
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] check_parent
@ ~/.julia/packages/AbstractAlgebra/lYBCD/src/AbstractAlgebra.jl:221 [inlined]
[3] check_parent
@ ~/.julia/packages/AbstractAlgebra/lYBCD/src/AbstractAlgebra.jl:220 [inlined]
[4] ==(a::QQMPolyRingElem, b::QQMPolyRingElem)
@ Nemo ~/.julia/packages/Nemo/VL5Dv/src/flint/fmpq_mpoly.jl:415
[5] top-level scope
@ REPL[10]:1
julia> x2 == x
true
julia> parent(x2) == parent(x)
true
Here is a shorter MWE (sorry for the spam):
julia> using Nemo
Welcome to Nemo version 0.49.0
Nemo comes with absolutely no warranty whatsoever
julia> R = universal_polynomial_ring(QQ)
Universal Polynomial Ring over Rational field
julia> xold = gen(R, :x)
x
julia> y = gen(R, :y)
y
julia> xnew = gen(R, :x)
x
julia> xold == xnew
true
julia> data(xold) == data(xnew)
ERROR: parents do not match
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] check_parent
@ ~/.julia/packages/AbstractAlgebra/lYBCD/src/AbstractAlgebra.jl:221 [inlined]
[3] check_parent
@ ~/.julia/packages/AbstractAlgebra/lYBCD/src/AbstractAlgebra.jl:220 [inlined]
[4] ==(a::QQMPolyRingElem, b::QQMPolyRingElem)
@ Nemo ~/.julia/packages/Nemo/VL5Dv/src/flint/fmpq_mpoly.jl:415
[5] top-level scope
@ REPL[9]:1
We could optimize the comparison of those special elements (the "generators"), but for arbitrary elements it will still be slow. Would that help?
Yes, that would definetly help for my use case. But dont' feel pressed to optimize it, I only used it for some experimental code and not for any really necessary calculations.
This is more about documenting the possible bottleneck, in case there is an easy fix or more people need it for performance reasons.
What is your use case? Often adding new variables? What is slow in the current implementation is when a new variable is added and then "new elements" are compared with "old element". Old and new refers to "before new variable" and "after new variable".
I used it as a backend for a "Polyform" based CAS System.
A more realistic use case might be replacing DynamicPolynomials.jl as backend for SymbolicUtils Polyform to implement missing features for Polynomials in Symbolics.jl.
In both cases non-polynomial terms are represented as new variables in the Polynomial and therefor the number of variables quickly grows and changes when terms are substituted or terms reordered. And for differentiation you need to check which terms depend on the variable to differentiate for, which are typically generators of the polynomial.
I added some of the improvements over at https://github.com/Nemocas/AbstractAlgebra.jl/pull/2028. Should make comparison of "generators" much quicker.