manim Optimized partial and smooth Bézier curve computations in `bezier.py`

Overview: What does this pull request change?

I made many changes to the functions defined in bezier.py to calculate Bézier curves faster. Here is a list of those changes, starting from the most important ones:

Replaced partial_quadratic_bezier_points, split_quadratic_bezier, subdivide_quadratic_bezier and quadratic_bezier_remap with generalized and optimized analogs partial_bezier_points, split_bezier, subdivide_bezier and bezier_remap
- bezier_remap reused (OpenGL)VMobject.insert_n_curves_to_point_list, so I rewrote the latter methods by using bezier_remap instead
Optimized get_smooth_cubic_bezier_handle_points
- Redirected get_smooth_handle_points to get_smooth_cubic_bezier_handle_points
- Added get_smooth_cubic_bezier_handle_points_for_open_curve and get_smooth_cubic_bezier_handle_points_for_closed_curve
- Removed the auxiliar function diag_to_matrix and the scipy import
Greatly optimized is_close, and slightly optimized bezier and interpolate

Motivation and Explanation: Why and how do your changes improve the library?

partial_bezier_points

The original algorithm made a lot of calls to bezier, making multiple redundant intermediate calculations that could've been reused for the other Béziers. Instead, in the n-th degree case I calculate all of the necessary points in a single pass, improving from O(n³) to O(n²) time. Even more, the process can be encoded as a matrix multiplication, which is what I did for the simple cases (up to cubic curves), giving around 10x speedup.

def partial_bezier_points(points, a, b):
    # Border cases...
    
    degree = points.shape[0] - 1
    if degree == 3:
        # calculate portion_matrix..
        return portion_matrix @ points
    # if degree in (2, 1, 0)...

    # Fallback case for nth degree Béziers
    # It is convenient that np.array copies points
    arr = np.array(points)
    N = arr.shape[0]
    
    if a != 0:
        for i in range(1, N):
            arr[: N - i] += a * (arr[1 : N - i + 1] - arr[: N - i])

    if b != 1:
        if a != 0:
            mu = (1 - b) / (1 - a)
        else:
            mu = 1 - b
        for i in range(1, N):
            arr[i:] += mu * (arr[i - 1 : -1] - arr[i:])

    return arr

split_bezier and subdivide_bezier

Again, the processes can be encoded as matrix multiplications. In the case of subdivide_bezier, these matrices can even be memoized, as they're always the same.

def split_bezier(points, t):
    points = np.asarray(points)
    N, dim = points.shape
    degree = N-1
    if degree == 3:
        # calculate split_matrix...
        return split_matrix @ points
    # if degree in (1, 0)...
    
    # Fallback case for nth degree
    arr = np.empty((2, N, dim))
    arr[1] = points
    arr[0, 0] = points[0]

    for i in range(1, N):
        arr[1, : N - i] += t * (arr[1, 1 : N - i + 1] - arr[1, : N - i])
        arr[0, i] = arr[1, 0]

    return arr


def subdivide_bezier(points, n_divisions):
    points = np.asarray(points)
    if n_divisions == 1:
        return points

    N, dim = points.shape
    degree = N - 1
    if degree == 3:
        # get subdivision_matrix...
        return subdivision_matrix @ points
    # if degree in (2, 1, 0)...

    # Fallback case for nth degree
    beziers = np.empty((n_divisions, N, dim))
    beziers[-1] = points
    for curve_num in range(n_divisions - 1, 0, -1):
        curr = beziers[curve_num]
        prev = beziers[curve_num - 1]
        prev[0] = curr[0]
        for i in range(1, N):
            a = (n_divisions - curve_num - 1) / (n_divisions - curve_num)
            curr[: N - i] += a * (curr[1 : N - i + 1] - curr[: N - i])
            prev[i] = curr[0]

    return beziers.reshape(n_divisions * N, dim)

`bezier_remap`

I realized that I could essentially reuse the code for VMobject.insert_n_curves_to_point_list for this. Notice the use of subdivide_bezier instead of repeated use of partial_bezier_points, which is much faster than the original algorithm:

def bezier_remap(bezier_tuples, new_number_of_curves):
    bezier_tuples = np.asarray(bezier_tuples)
    current_number_of_curves, nppc, dim = bezier_tuples.shape
    repeat_indices = (
        np.arange(new_number_of_curves, dtype="i") * current_number_of_curves
    ) // new_number_of_curves

    split_factors = np.zeros(current_number_of_curves, dtype="i")
    np.add.at(split_factors, repeat_indices, 1)

    new_tuples = np.empty((new_number_of_curves, nppc, dim))
    index = 0
    for curve, sf in zip(bezier_tuples, split_factors):
        new_tuples[index : index + sf] = subdivide_bezier(curve, sf).reshape(
            sf, nppc, dim
        )
        index += sf
    return new_tuples

Because of the above, I also straight up rewrote (OpenGL)VMobject.insert_n_curves_to_point_list to use bezier_remap instead:

def insert_n_curves_to_point_list(self, n: int, points: np.ndarray) -> np.ndarray:
    if len(points) == 1:
        nppcc = self.n_points_per_cubic_curve
        return np.repeat(points, nppcc * n, 0)
        
    bezier_tuples = self.get_cubic_bezier_tuples_from_points(points)
    new_number_of_curves = bezier_tuples.shape[0] + n
    new_bezier_tuples = bezier_remap(bezier_tuples, new_number_of_curves)
    new_points = new_bezier_tuples.reshape(-1, 3)

get_smooth_handle_points

Now we arrive to the first function in bezier.py I modified, and the main reason I made this PR.

I had a test scene with more than 100 ParametricFunctions in it, which were all being updated. More than half of the render time was spent on ParametricFunction.generate_points, and one of the 3 main culprits was VMobject.make_smooth. There were many issues in that function I plan to address in another PR, but the one I'm going to address in this specific PR is get_smooth_handle_points.

The main bottleneck for this function is the call to scipy.linalg.solve_banded. Apparently, it spends more time validating the arrays passed as a parameter, than actually solving the system of equations.

That's why I decided to manually implement the algorithm for solving these equations. But also, all the matrices follow the same pattern and their coefficients are pretty much always the same, varying only in size, so the diagonals can be hardcoded in the algorithm rather than stored explicitly. Some of them can even be memoized, which is what I did!

In the end, I got a speedup of around 3x. I can show a Snakeviz output of this, but that output includes a change to the is_closed function which really needed a rewrite, so lemme talk about it first.

is_closed

Repeatedly using np.allclose to compare only two 3D points each time is actually very expensive and takes a painful part of the total time of get_smooth_handle_points. So I rewrote it to a simple "compare 1st coords, then compare 2nd coords, then compare 3rd coords", which resulted in almost a 30x speedup!

def is_closed(points: Point3D_Array) -> bool:
    start, end = points[0], points[-1]
    atol = 1e-8
    if abs(end[0] - start[0]) > atol:
        return False
    if abs(end[1] - start[1]) > atol:
        return False
    if abs(end[2] - start[2]) > atol:
        return False
    return True

SnakeViz output for VMobject.make_smooth before/after the changes to get_smooth_handle_points and is_closed:

Before	After

A closer look to get_smooth_handle_points:

Before	After

The red block at the right is the new is_closed. It now takes much less time to run! Combined with the get_smooth_handle_points optimization, this latter function now has an overall speedup of around 4x.

interpolate

Rewriting this function to use 1 product instead of 2, makes a speedup of around 30%:

def interpolate(start, end, alpha):
    # return (1 - alpha) * start + alpha * end # original
    return start + alpha * (end - start) # new

bezier

First of all, I added the cases for Bézier curves of grade 0 and 1. This was originally intended for partial_bezier_curves, which repeatedly called bezier and was not prepared for these specific cases, falling back to the expensive general nth grade case:

def bezier(
    points: Sequence[Point3D] | Point3D_Array,
) -> Callable[[float | ColVector], Point3D | Point3D_Array]:
    P = np.asarray(points)
    num_points, dim = P.shape
    n = num_points - 1

    if n == 0:
        def zero_bezier(t):
            return np.ones_like(t) * P[0]
        return zero_bezier

    if n == 1:
        def linear_bezier(t):
            return P[0] + t * (P[1] - P[0])
        return linear_bezier

    # ...

Then, I made a very slight optimization for the cubic Bëzier case:

    if n == 3:
        def cubic_bezier(t):
            t2 = t * t
            t3 = t2 * t
            mt = 1 - t
            mt2 = mt * mt
            mt3 = mt2 * mt
            return mt3 * P[0] + 3 * t * mt2 * P[1] + 3 * t2 * mt * P[2] + t3 * P[3]
        return cubic_bezier

It reported a small speedup of around 10% - 15%. Not too bad! I did something similar with the quadratic Bëzier:

    if n == 2:
        def quadratic_bezier(t):
            t2 = t * t
            mt = 1 - t
            mt2 = mt * mt
            return mt2 * P[0] + 2 * t * mt * P[1] + t2 * P[2]
        return quadratic_bezier

Finally, the general nth grade Bézier case. I avoided the extensive use of the choose function which can be expensive, and instead used the recursive definition of Béziers I used for functions like partial_bezier_points and subdivide_bezier.

Also, bezier now returns named functions instead of lambdas, which is good for debugging and profiling: now you can know which function is taking what part of the time when profiling your code.

Links to added or changed documentation pages

manim.utils.bezier https://manimce--3281.org.readthedocs.build/en/3281/reference/manim.utils.bezier.html

Reviewer Checklist

[ ] The PR title is descriptive enough for the changelog, and the PR is labeled correctly
[ ] If applicable: newly added non-private functions and classes have a docstring including a short summary and a PARAMETERS section
[ ] If applicable: newly added functions and classes are tested

Jul 11 '23 08:07 chopan050

Hello everyone! It's been a long time.

I made some updates and extra optimizations I discovered a long time ago while working on #3336 :)
Also, for the sake of unifying the source code of VMobject and OpenGLVMobject a little bit more, I redirected the quadratic Bézier functions to their generalized counterparts. Those counterparts mainly check how many input points there are, and decide which algorithm to apply based on the curve degree.
Finally I also added a bezier_remap function (generalization of quadratic_bezier_remap) which reuses the source code of VMobject.insert_n_curves_to_point_list. My plan is to rewrite the latter function to use the former instead (see details in original post).

Nov 03 '23 00:11 chopan050

Update:

Fixed subdivide_bezier() which had some typos causing wrong results for Béziers of degree 3 or higher
Added tests for partial_bezier_points(), split_bezier() and subdivide_bezier()

Dec 30 '23 12:12 chopan050