Optimized partial and smooth Bézier curve computations in `bezier.py`
Overview: What does this pull request change?
I made many changes to the functions defined in bezier.py to calculate Bézier curves faster. Here is a list of those changes, starting from the most important ones:
- Replaced
partial_quadratic_bezier_points,split_quadratic_bezier,subdivide_quadratic_bezierandquadratic_bezier_remapwith generalized and optimized analogspartial_bezier_points,split_bezier,subdivide_bezierandbezier_remapbezier_remapreused(OpenGL)VMobject.insert_n_curves_to_point_list, so I rewrote the latter methods by usingbezier_remapinstead
- Optimized
get_smooth_cubic_bezier_handle_points- Redirected
get_smooth_handle_pointstoget_smooth_cubic_bezier_handle_points - Added
get_smooth_cubic_bezier_handle_points_for_open_curveandget_smooth_cubic_bezier_handle_points_for_closed_curve - Removed the auxiliar function
diag_to_matrixand thescipyimport
- Redirected
- Greatly optimized
is_close, and slightly optimizedbezierandinterpolate
Motivation and Explanation: Why and how do your changes improve the library?
partial_bezier_points
The original algorithm made a lot of calls to bezier, making multiple redundant intermediate calculations that could've been reused for the other Béziers. Instead, in the n-th degree case I calculate all of the necessary points in a single pass, improving from O(n³) to O(n²) time. Even more, the process can be encoded as a matrix multiplication, which is what I did for the simple cases (up to cubic curves), giving around 10x speedup.
def partial_bezier_points(points, a, b):
# Border cases...
degree = points.shape[0] - 1
if degree == 3:
# calculate portion_matrix..
return portion_matrix @ points
# if degree in (2, 1, 0)...
# Fallback case for nth degree Béziers
# It is convenient that np.array copies points
arr = np.array(points)
N = arr.shape[0]
if a != 0:
for i in range(1, N):
arr[: N - i] += a * (arr[1 : N - i + 1] - arr[: N - i])
if b != 1:
if a != 0:
mu = (1 - b) / (1 - a)
else:
mu = 1 - b
for i in range(1, N):
arr[i:] += mu * (arr[i - 1 : -1] - arr[i:])
return arr
split_bezier and subdivide_bezier
Again, the processes can be encoded as matrix multiplications. In the case of subdivide_bezier, these matrices can even be memoized, as they're always the same.
def split_bezier(points, t):
points = np.asarray(points)
N, dim = points.shape
degree = N-1
if degree == 3:
# calculate split_matrix...
return split_matrix @ points
# if degree in (1, 0)...
# Fallback case for nth degree
arr = np.empty((2, N, dim))
arr[1] = points
arr[0, 0] = points[0]
for i in range(1, N):
arr[1, : N - i] += t * (arr[1, 1 : N - i + 1] - arr[1, : N - i])
arr[0, i] = arr[1, 0]
return arr
def subdivide_bezier(points, n_divisions):
points = np.asarray(points)
if n_divisions == 1:
return points
N, dim = points.shape
degree = N - 1
if degree == 3:
# get subdivision_matrix...
return subdivision_matrix @ points
# if degree in (2, 1, 0)...
# Fallback case for nth degree
beziers = np.empty((n_divisions, N, dim))
beziers[-1] = points
for curve_num in range(n_divisions - 1, 0, -1):
curr = beziers[curve_num]
prev = beziers[curve_num - 1]
prev[0] = curr[0]
for i in range(1, N):
a = (n_divisions - curve_num - 1) / (n_divisions - curve_num)
curr[: N - i] += a * (curr[1 : N - i + 1] - curr[: N - i])
prev[i] = curr[0]
return beziers.reshape(n_divisions * N, dim)
bezier_remap
I realized that I could essentially reuse the code for VMobject.insert_n_curves_to_point_list for this. Notice the use of subdivide_bezier instead of repeated use of partial_bezier_points, which is much faster than the original algorithm:
def bezier_remap(bezier_tuples, new_number_of_curves):
bezier_tuples = np.asarray(bezier_tuples)
current_number_of_curves, nppc, dim = bezier_tuples.shape
repeat_indices = (
np.arange(new_number_of_curves, dtype="i") * current_number_of_curves
) // new_number_of_curves
split_factors = np.zeros(current_number_of_curves, dtype="i")
np.add.at(split_factors, repeat_indices, 1)
new_tuples = np.empty((new_number_of_curves, nppc, dim))
index = 0
for curve, sf in zip(bezier_tuples, split_factors):
new_tuples[index : index + sf] = subdivide_bezier(curve, sf).reshape(
sf, nppc, dim
)
index += sf
return new_tuples
Because of the above, I also straight up rewrote (OpenGL)VMobject.insert_n_curves_to_point_list to use bezier_remap instead:
def insert_n_curves_to_point_list(self, n: int, points: np.ndarray) -> np.ndarray:
if len(points) == 1:
nppcc = self.n_points_per_cubic_curve
return np.repeat(points, nppcc * n, 0)
bezier_tuples = self.get_cubic_bezier_tuples_from_points(points)
new_number_of_curves = bezier_tuples.shape[0] + n
new_bezier_tuples = bezier_remap(bezier_tuples, new_number_of_curves)
new_points = new_bezier_tuples.reshape(-1, 3)
get_smooth_handle_points
Now we arrive to the first function in bezier.py I modified, and the main reason I made this PR.
I had a test scene with more than 100 ParametricFunctions in it, which were all being updated. More than half of the render time was spent on ParametricFunction.generate_points, and one of the 3 main culprits was VMobject.make_smooth. There were many issues in that function I plan to address in another PR, but the one I'm going to address in this specific PR is get_smooth_handle_points.
The main bottleneck for this function is the call to scipy.linalg.solve_banded. Apparently, it spends more time validating the arrays passed as a parameter, than actually solving the system of equations.
That's why I decided to manually implement the algorithm for solving these equations. But also, all the matrices follow the same pattern and their coefficients are pretty much always the same, varying only in size, so the diagonals can be hardcoded in the algorithm rather than stored explicitly. Some of them can even be memoized, which is what I did!
In the end, I got a speedup of around 3x. I can show a Snakeviz output of this, but that output includes a change to the is_closed function which really needed a rewrite, so lemme talk about it first.
is_closed
Repeatedly using np.allclose to compare only two 3D points each time is actually very expensive and takes a painful part of the total time of get_smooth_handle_points. So I rewrote it to a simple "compare 1st coords, then compare 2nd coords, then compare 3rd coords", which resulted in almost a 30x speedup!
def is_closed(points: Point3D_Array) -> bool:
start, end = points[0], points[-1]
atol = 1e-8
if abs(end[0] - start[0]) > atol:
return False
if abs(end[1] - start[1]) > atol:
return False
if abs(end[2] - start[2]) > atol:
return False
return True
SnakeViz output for VMobject.make_smooth before/after the changes to get_smooth_handle_points and is_closed:
| Before | After |
|---|---|
A closer look to get_smooth_handle_points:
| Before | After |
|---|---|
The red block at the right is the new is_closed. It now takes much less time to run! Combined with the get_smooth_handle_points optimization, this latter function now has an overall speedup of around 4x.
interpolate
Rewriting this function to use 1 product instead of 2, makes a speedup of around 30%:
def interpolate(start, end, alpha):
# return (1 - alpha) * start + alpha * end # original
return start + alpha * (end - start) # new
bezier
First of all, I added the cases for Bézier curves of grade 0 and 1. This was originally intended for partial_bezier_curves, which repeatedly called bezier and was not prepared for these specific cases, falling back to the expensive general nth grade case:
def bezier(
points: Sequence[Point3D] | Point3D_Array,
) -> Callable[[float | ColVector], Point3D | Point3D_Array]:
P = np.asarray(points)
num_points, dim = P.shape
n = num_points - 1
if n == 0:
def zero_bezier(t):
return np.ones_like(t) * P[0]
return zero_bezier
if n == 1:
def linear_bezier(t):
return P[0] + t * (P[1] - P[0])
return linear_bezier
# ...
Then, I made a very slight optimization for the cubic Bëzier case:
if n == 3:
def cubic_bezier(t):
t2 = t * t
t3 = t2 * t
mt = 1 - t
mt2 = mt * mt
mt3 = mt2 * mt
return mt3 * P[0] + 3 * t * mt2 * P[1] + 3 * t2 * mt * P[2] + t3 * P[3]
return cubic_bezier
It reported a small speedup of around 10% - 15%. Not too bad! I did something similar with the quadratic Bëzier:
if n == 2:
def quadratic_bezier(t):
t2 = t * t
mt = 1 - t
mt2 = mt * mt
return mt2 * P[0] + 2 * t * mt * P[1] + t2 * P[2]
return quadratic_bezier
Finally, the general nth grade Bézier case. I avoided the extensive use of the choose function which can be expensive, and instead used the recursive definition of Béziers I used for functions like partial_bezier_points and subdivide_bezier.
Also, bezier now returns named functions instead of lambdas, which is good for debugging and profiling: now you can know which function is taking what part of the time when profiling your code.
Links to added or changed documentation pages
manim.utils.bezier
https://manimce--3281.org.readthedocs.build/en/3281/reference/manim.utils.bezier.html
Reviewer Checklist
- [ ] The PR title is descriptive enough for the changelog, and the PR is labeled correctly
- [ ] If applicable: newly added non-private functions and classes have a docstring including a short summary and a PARAMETERS section
- [ ] If applicable: newly added functions and classes are tested
Hello everyone! It's been a long time.
- I made some updates and extra optimizations I discovered a long time ago while working on #3336 :)
- Also, for the sake of unifying the source code of
VMobjectandOpenGLVMobjecta little bit more, I redirected the quadratic Bézier functions to their generalized counterparts. Those counterparts mainly check how many input points there are, and decide which algorithm to apply based on the curve degree. - Finally I also added a
bezier_remapfunction (generalization ofquadratic_bezier_remap) which reuses the source code ofVMobject.insert_n_curves_to_point_list. My plan is to rewrite the latter function to use the former instead (see details in original post).
Update:
- Fixed
subdivide_bezier()which had some typos causing wrong results for Béziers of degree 3 or higher - Added tests for
partial_bezier_points(),split_bezier()andsubdivide_bezier()