tqec Improve the performance of finding correlation surfaces

Is your feature request related to a problem? Please describe.

Finding correlation surfaces is extremely slow (as in, it may take months) for moderately large block graphs (~30 ports, ~500 volume), effectively unusable beyond trivial scale.

Describe the solution you'd like

A simple improvement is that, in tqec.interop.pyzx.correlation.find_correlation_surfaces(), replace

  correlation_surfaces: set[CorrelationSurface] = set()
  for leaf in leaves:
      correlation_surfaces.update(_find_correlation_surfaces_from_leaf(g, leaf))

with

from functools import partial
from multiprocessing import Pool

with Pool() as pool:
    correlation_surfaces = set(
        sum(
            pool.imap_unordered(
                  partial(_find_correlation_surfaces_from_leaf, g), leaves
              ),
            start=[],
        )
    )

to get a min(num_cpus, num_ports) speedup, since _find_correlation_surfaces_from_leaf is embarrassingly parallel.

However, it is not adequate at all, as the poor scalability easily dominates this constant speedup. A potentially more effective solution I'm considering is to partition a large block graph, find the correlation surfaces for the partitioned subgraphs, and generate combinations that satisfy boundary conditions.

Describe alternatives you've considered

Caching the intermediate results to avoid recomputation, which turned out slower due to the hashing overhead.
PyZX's Pauli web finding. Ran into the same gFlow issue (see #528).

Sep 12 '25 05:09 HaoTy

Hi @HaoTy I am interested in investigating this issue, do you have a readily available test case?

Sep 12 '25 05:09 Zhaoyilunnn

Caching the intermediate results to avoid recomputation, which turned out slower due to the hashing overhead.

PyZX's Pauli web finding. Ran into the same gFlow issue (see https://github.com/tqec/tqec/issues/528).

Making a comment to come back to this issue later: I have a couple of handwritten notes on this in addition to some refactor proposals. Will get to typing these up after I have resolved the other issues that I have started to work on. Sorry for the delay.

One main issue I have found over the past week is that the project is based on python when we ought to be thinking of using Rust[^2]. Austin also pointed this out over an email conversation, but his suggestion was to rewrite parts of tqec in C++ for very big Clifford circuits[^1].
The other one is that PyZX's Pauli web finding is insufficient. I have a couple of ideas on how to improve/refactor this in my handwritten notes.

[^1]: I had emailed him about a question related to adding non-Clifford circuits. [^2]: Mostly due to https://users.rust-lang.org/t/python-to-rust-conversion-rust-in-general/66158

Sep 12 '25 15:09 purva-thakre

Hi @HaoTy I am interested in investigating this issue, do you have a readily available test case?

The following script composes multiple copies of steane_encoding to create increasingly larger block graphs and records the time required to find the correlation surfaces. I hope this is sufficient to illustrate the poor scaling and generate arbitrarily large examples. Note that relabel_cubes() has a bug so you will need #699.

from time import time

from tqec.computation.cube import ZXCube
from tqec.gallery import steane_encoding


block_graph = None
for i in range(5):
    g = steane_encoding()
    g.relabel_cubes({f"Port{j}": f"Port{j}_{i}" for j in range(7)})
    if block_graph is None:
        block_graph = g
    else:
        block_graph = block_graph.compose(g, f"Port4_{i - 1}", f"Port3_{i}")
    graph_clone = block_graph.clone()
    graph_clone.fill_ports(ZXCube.from_str("ZXZ"))
    start_time = time()
    graph_clone.find_correlation_surfaces()
    print(
        f"Finding correlation surfaces for block graph with {graph_clone.num_cubes}"
        f" cubes, {graph_clone.num_pipes} pipes, and {len(graph_clone.leaf_cubes)}"
        f" leaf cubes took {time() - start_time:.2f} s"
    )

Output:

Finding correlation surfaces for block graph with 19 cubes, 20 pipes, and 7 leaf cubes took 0.01 s
Finding correlation surfaces for block graph with 35 cubes, 40 pipes, and 8 leaf cubes took 0.14 s
Finding correlation surfaces for block graph with 51 cubes, 60 pipes, and 9 leaf cubes took 1.62 s
Finding correlation surfaces for block graph with 67 cubes, 80 pipes, and 10 leaf cubes took 13.37 s
Finding correlation surfaces for block graph with 83 cubes, 100 pipes, and 11 leaf cubes took 95.87 s

Sep 14 '25 01:09 HaoTy

I have a couple of handwritten notes on this in addition to some refactor proposals.

One main issue I have found over the past week is that the project is based on python when we ought to be thinking of using Rust1. Austin also pointed this out over an email conversation, but his suggestion was to rewrite parts of tqec in C++ for very big Clifford circuits2.

The other one is that PyZX's Pauli web finding is insufficient. I have a couple of ideas on how to improve/refactor this in my handwritten notes.

Migrating to Rust or C++ is of course helpful and something eventually should be done. Nonetheless, I'm afraid this speedup is still not enough for large block graphs, and an algorithmic improvement is necessary. Do you mind sharing some of your ideas?

Sep 14 '25 01:09 HaoTy

an algorithmic improvement is necessary

I agree with you. For this particular concern, I want to explore efficient alternatives to tqec's approach to finding correlation surfaces. See #637 for more. One possible alternative I have considered are algorithms that specialize in cluster analysis.

I should state I am looking for additional ideas for this. It would be better to benchmark the performance of all of them before we decide to formally implement one in tqec.

Sep 16 '25 04:09 purva-thakre

Before switching to other programming languages or trying some more complicated graph algorithms. I think I found another opportunity that can potentially significantly improve performance.

In a word, we try finding correlation surfaces from each leaf node, but most unique surfaces are found in the first several leaf nodes, so there are many redundant computations.

I constructed a benchmark program, a

click to expand

"""
Commit id: 62099b293cf0d824ddf932421553b5589819fb6e
"""

import argparse
from collections import defaultdict
from time import time

from tqec.computation.block_graph import BlockGraph
from tqec.computation.cube import ZXCube
from tqec.gallery import steane_encoding
from tqec.interop.pyzx.correlation import _find_correlation_surfaces_from_leaf
from tqec.interop.pyzx.positioned import PositionedZX


def construct_chain_of_steane_codes(num_replicate: int) -> BlockGraph:
    """Construct a chain of steane codes to create a larger graph for testing."""
    block_graph: BlockGraph | None = None
    graph_clone: BlockGraph | None = None
    for i in range(num_replicate):
        g = steane_encoding()
        g.relabel_cubes({f"Port{j}": f"Port{j}_{i}" for j in range(7)})
        if block_graph is None:
            block_graph = g
        else:
            block_graph = block_graph.compose(g, f"Port4_{i - 1}", f"Port3_{i}")
        graph_clone = block_graph.clone()
        graph_clone.fill_ports(ZXCube.from_str("ZXZ"))
    assert block_graph is not None and graph_clone is not None
    return graph_clone


def test_find_correlation_surface(graph: BlockGraph, show_details: bool = False):
    """Test the performance and redundancy of finding correlation surfaces from leaf nodes."""
    # 1. Convert to pyzx graph
    positioned_zx = PositionedZX.from_block_graph(graph)
    zx_g = positioned_zx.g
    p2v = positioned_zx.p2v

    # 2. Get leaf nodes
    leaf_cubes = graph.leaf_cubes
    leaves = {p2v[cube.position] for cube in leaf_cubes}
    print(f"   - Found {len(leaves)} leaf nodes in the graph.")

    print("\n2. Starting computation analysis...")

    unique_surfaces = set()
    all_surfaces_found = []

    total_computation_time = 0.0
    redundant_computation_time = 0.0
    new_discovery_time = 0.0

    # Store which leaf found which surface
    surface_to_leaf_map = defaultdict(list)

    for k, leaf in enumerate(leaves):
        print(
            f"   - Processing leaf {k + 1}/{len(leaves)} (Node ID: {leaf})...", end=""
        )

        start_time = time()
        # This is the function we want to test
        surfaces_from_this_leaf = _find_correlation_surfaces_from_leaf(zx_g, leaf)
        end_time = time()

        duration = end_time - start_time
        total_computation_time += duration

        if not surfaces_from_this_leaf:
            print(f" Found 0 surfaces in {duration:.6f} seconds. (Redundant)")
            redundant_computation_time += duration
            continue

        all_surfaces_found.extend(surfaces_from_this_leaf)

        # Check if this computation discovered any new surfaces
        is_new_discovery = False
        newly_found_count = 0
        for surface in surfaces_from_this_leaf:
            surface_to_leaf_map[surface].append(leaf)
            if surface not in unique_surfaces:
                is_new_discovery = True
                unique_surfaces.add(surface)
                newly_found_count += 1

        if is_new_discovery:
            new_discovery_time += duration
            print(
                f" Found {len(surfaces_from_this_leaf)} surfaces ({newly_found_count} new) "
                f"in {duration:.6f} seconds. (Discovery)"
            )
        else:
            redundant_computation_time += duration
            print(
                f" Found {len(surfaces_from_this_leaf)} surfaces "
                f"(0 new) in {duration:.6f} seconds. (Redundant)"
            )

    print("\n3. Analysis Results:")
    print("-" * 30)

    num_total_found = len(all_surfaces_found)
    num_unique_found = len(unique_surfaces)
    num_redundant_computations = num_total_found - num_unique_found

    print(f"Total surfaces found (including duplicates): {num_total_found}")
    print(f"Unique surfaces found: {num_unique_found}")
    print(f"Number of redundant surfaces found: {num_redundant_computations}")

    print("\nPerformance Impact:")
    print(f"Total computation time: {total_computation_time:.6f} seconds")
    print(
        f"Time spent on computations yielding new surfaces: {new_discovery_time:.6f} seconds"
    )
    print(
        f"Time spent on purely redundant computations: {redundant_computation_time:.6f} seconds"
    )

    if total_computation_time > 0:
        redundancy_percentage = (
            redundant_computation_time / total_computation_time
        ) * 100
        print(
            f"Percentage of time wasted on redundant computations: {redundancy_percentage:.2f}%"
        )

    if show_details:
        print("\nDetails of redundant findings:")
        for k, surface in enumerate(unique_surfaces):
            finders = surface_to_leaf_map[surface]
            if len(finders) > 1:
                print(
                    f"  - Surface {k + 1} was found {len(finders)} times by leaves: {finders}"
                )


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Test correlation surface finding on Steane code chains."
    )
    parser.add_argument(
        "--num_replicate",
        type=int,
        default=4,
        help="Number of Steane code blocks to chain together (default: 4)",
    )
    args = parser.parse_args()

    print("1. Constructing a chain of Steane codes...")
    test_graph = construct_chain_of_steane_codes(args.num_replicate)
    print("   - Construction complete.\n")
    test_find_correlation_surface(test_graph)

    print("\n" + "-" * 30)
    print("4. Comparing with the original `find_correlation_surfaces` function.")
    start_time = time()
    surfaces = test_graph.find_correlation_surfaces(
        reduce_to_minimal_generators=False
    )  # for fair comparison
    end_time = time()
    print(f"Total time taken: {end_time - start_time:.6f} seconds")
    print(f"Total number of surfaces found: {len(surfaces)}")

And the output is

1. Constructing a chain of Steane codes...
   - Construction complete.

   - Found 10 leaf nodes in the graph.

2. Starting computation analysis...
   - Processing leaf 1/10 (Node ID: 64)... Found 608 surfaces (56 new) in 3.326501 seconds. (Discovery)
   - Processing leaf 2/10 (Node ID: 33)... Found 641 surfaces (25 new) in 1.389932 seconds. (Discovery)
   - Processing leaf 3/10 (Node ID: 65)... Found 621 surfaces (15 new) in 4.782699 seconds. (Discovery)
   - Processing leaf 4/10 (Node ID: 66)... Found 563 surfaces (8 new) in 2.491807 seconds. (Discovery)
   - Processing leaf 5/10 (Node ID: 45)... Found 386 surfaces (4 new) in 1.399228 seconds. (Discovery)
   - Processing leaf 6/10 (Node ID: 16)... Found 641 surfaces (2 new) in 1.371811 seconds. (Discovery)
   - Processing leaf 7/10 (Node ID: 50)... Found 641 surfaces (1 new) in 1.639977 seconds. (Discovery)
   - Processing leaf 8/10 (Node ID: 28)... Found 386 surfaces (0 new) in 1.374751 seconds. (Redundant)
   - Processing leaf 9/10 (Node ID: 62)... Found 386 surfaces (0 new) in 1.489515 seconds. (Redundant)
   - Processing leaf 10/10 (Node ID: 63)... Found 497 surfaces (0 new) in 1.839615 seconds. (Redundant)

3. Analysis Results:
------------------------------
Total surfaces found (including duplicates): 5370
Unique surfaces found: 111
Number of redundant surfaces found: 5259

Performance Impact:
Total computation time: 21.105837 seconds
Time spent on computations yielding new surfaces: 16.401957 seconds
Time spent on purely redundant computations: 4.703881 seconds
Percentage of time wasted on redundant computations: 22.29%

------------------------------
4. Comparing with the original `find_correlation_surfaces` function.
Total time taken: 21.341385 seconds
Total number of surfaces found: 111

We can see that

_find_correlation_surfaces_from_leaf produces even no new surface in the last three runs.
In each run, many surfaces are found but only a few are new surfaces.

This confirms that many redundant computations exist and can be optimized.

I can work on this issue if other people agree on this optimization opportunity (Please assign me).

Thanks, Yilun

Sep 17 '25 15:09 Zhaoyilunnn

In a word, we try finding correlation surfaces from each leaf node, but most unique surfaces are found in the first several leaf nodes, so there are many redundant computations.

Exploiting this will help, but does not solve the root cause that a single _find_correlation_surfaces_from_leaf() call may explore exponentially many possibilities and simply takes too long. In my past attempt at improving correlation surface finding, I parallelized the computation from all leaves but was still bottlenecked by the runtime of each single call. So really we need to reduce the search space in the algorithm.

One realization I had was that for m ports, we need no more than m correlation surfaces (fewer if some ports are filled) to construct a minimal generator set, due to their correspondance with stabilizer (flows). Thus, a viable improvement is that, instead of starting from a single leaf and trying to find many correlation surfaces, we focus on one at a time by fixing the Pauli at each leaf, which should significantly reduce the search space, if not uniquely determine a correlation surface. Of course, knowing the Paulis would mean knowing the stabilizer (flows). Although block graphs derived from regular circuits can use Stim to efficiently compute them, I don't know if it's generally possible with just the block graph since stabilizer flows may not be well-defined (can pyzx do it?). In the case that we don't know the stabilizer (flows) beforehand, a fallback is to fix m/2 Paulis that consist of a generator set of stabilizer (flows) (e.g., ZI, IZ, XI, IX for CNOT), which should still reduce the search space by a great amount. It seems the current algorithm can be adapted to do this, and I'm trying to get a working implementation.

Note that in either case, we directly get a generator set and the area will be not minimized, unless we choose to generate all the correlation surfaces (which would be unreasonable for large block graphs).

Sep 18 '25 08:09 HaoTy

So really we need to reduce the search space in the algorithm.

Agree. As also indicated by my test, in each _find_correlation_surfaces_from_leaf() run, there are only a few unique surfaces.

a viable improvement is that, instead of starting from a single leaf and trying to find many correlation surfaces, we focus on one at a time by fixing the Pauli at each leaf, which should significantly reduce the search space, if not uniquely determine a correlation surface.

I assume this means, for example, in the below zx-graph, we first choose the leaf nodes (highlighted in the figure), and then these nodes already uniquely determines a correlation surface? If this is the idea, I agree and am also thinking in this direction, but I don't have a concrete method to realize this.

In the case that we don't know the stabilizer (flows) beforehand, a fallback is to fix m/2 Paulis that consist of a generator set of stabilizer (flows) (e.g., ZI, IZ, XI, IX for CNOT), which should still reduce the search space by a great amount. It seems the current algorithm can be adapted to do this, and I'm trying to get a working implementation.

This second solution is also promising.

But I think there should be a way to implement the first one. Given a block graph, if we know which leaves form the input/output of a correlation surface, it should be easier to determine the surface without exponential search. And as you mentioned we won't have more than $m$ surfaces. We will need no more than $m$ times of "easy" search. I think this is the right direction?

Sep 18 '25 08:09 Zhaoyilunnn

I assume this means, for example, in the below zx-graph, we first choose the leaf nodes (highlighted in the figure), and then these nodes already uniquely determines a correlation surface? If this is the idea, I agree and am also thinking in this direction, but I don't have a concrete method to realize this.

We would also fix the top left port to be "I", i.e., not having a correlation surface, and propagate that.

But I think there should be a way to implement the first one. Given a block graph, if we know which leaves form the input/output of a correlation surface, it should be easier to determine the surface without exponential search. And as you mentioned we won't have more than m surfaces. We will need no more than m times of "easy" search. I think this is the right direction?

But other than reducing to stabilizer flow calculation, I don't know how to find the Paulis corresponding to the correlation surfaces, and stabilizer flow is not always well-defined for a block graph.

Sep 19 '25 05:09 HaoTy

reducing to stabilizer flow calculation

This is reasonable but first as you mentioned, I don't know if we can get stabilizer flow purely based on block graph. Second, how do we determine the input ports and output ports? I also have this second concern for your second proposal "fix m/2 Paulis that consist of a generator set of stabilizer (flows)", to fix m/2 ports I assume that we need to know these are input ports? Please correct me if this is wrong.

So actually my thought is that we simply don't fix the pauli basis, rather we reduce this to a pure graph path finding problem (just like the current implementation), but rather than doing a DFS from each leaf, I am wondering if we can start from multiple leaf and somehow avoiding finding too many redundant correlation surfaces, i.e., reduce the search space.

Sep 19 '25 05:09 Zhaoyilunnn

Second, how do we determine the input ports and output ports? I also have this second concern for your second proposal "fix m/2 Paulis that consist of a generator set of stabilizer (flows)", to fix m/2 ports I assume that we need to know these are input ports? Please correct me if this is wrong.

I don't think input and output ports are distinguishable in a block graph, and my understanding is they shouldn't matter. As long as the Paulis form a generator set that leaves no degree of freedom, they should effectively cover all possible correlation surfaces.

So actually my thought is that we simply don't fix the pauli basis, rather we reduce this to a pure graph path finding problem (just like the current implementation), but rather than doing a DFS from each leaf, I am wondering if we can start from multiple leaf and somehow avoiding finding too many redundant correlation surfaces, i.e., reduce the search space.

How do you know which leaves to start from? There are exponentially many combinations.

Sep 19 '25 06:09 HaoTy

I don't think input and output ports are distinguishable in a block graph, and my understanding is they shouldn't matter. As long as the Paulis form a generator set that leaves no degree of freedom, they should effectively cover all possible correlation surfaces.

I might be wrong, but we cannot set the top left port to X and bottom left port to I like the blow figure?

How do you know which leaves to start from? There are exponentially many combinations.

You're right this is a problem, I don't expect to find the correct leafs, I just find that starting from one leaf, current algorithm get many correlation surfaces and many of them are redundant, so I assume there's some redundant calculations and starting from multiple leaves may be a direction to explore

Sep 19 '25 06:09 Zhaoyilunnn

I might be wrong, but we cannot set the top left port to X and bottom left port to I like the blow figure?

Hmmm, you are right. Things are a lot trickier then.

Sep 19 '25 06:09 HaoTy

starting from multiple leaves may be a direction to explore

I think the current algorithm supports this. Try setting the frontier to multiple leaves in the initial _find_spans_with_flood_fill() calls.

Sep 19 '25 06:09 HaoTy

I have an idea for a in principle very efficient algorithm. In essence, we prune duplicate correlation surfaces (judged by the stabilizer equivalence) during the search. As long as the algorithm is implemented in a way that keeps the frontier consistent for all searching branches and keeps track of all current correlation surfaces and stabilizers, duplicate searching branches can be checked and pruned as soon as they appear.

Sep 20 '25 03:09 HaoTy

I have an idea for a in principle very efficient algorithm. In essence, we prune duplicate correlation surfaces (judged by the stabilizer equivalence) during the search. As long as the algorithm is implemented in a way that keeps the frontier consistent for all searching branches and keeps track of all current correlation surfaces and stabilizers, duplicate searching branches can be checked and pruned as soon as they appear.

Sounds reasonable to me 👍, although I didn't get all the details behind. Please go ahead if your have a draft PR. I need more time to fully understand the sources of duplicates in current implementations.

~~In each _find_surfaces_from_leaf pass, it will find redundant surfaces (not redundant since they are not independent but simply because they are exactly the same) which grows to be way more than unique surfaces as the graph size increases. If reducing the problem to a tree it should not happen in theory. So I doubt there are some issues with current implementation and simple fix can solve the problem.~~

Edit: turns out the current algorithm does not introduce redundancy by mistake, it is possible that different paths produce the same correlation surface 🤔

Sep 21 '25 06:09 Zhaoyilunnn

I think I have idenfied one clear redundancy in current implementation. It would be great if @HaoTy @purva-thakre @inmzhang anybody could check if I am correct.

When visiting a "passthrough" node, i.e., same color observable, the current implementation is

https://github.com/tqec/tqec/blob/82f74f565b68209cd7e9fbc16f21b28f4857acbd/src/tqec/interop/pyzx/correlation.py#L245-L269

Which I think the blow line includes unnecessary branches.

https://github.com/tqec/tqec/blob/82f74f565b68209cd7e9fbc16f21b28f4857acbd/src/tqec/interop/pyzx/correlation.py#L261

Although it is true that "same color observable must touch an even number of edges". At each time we visit a passthrough node, we can only choose one edge.

For example in the below figure if we "drag" the blue line from left to right, it cannot touch all four edges, we can only choose one direction.

Therefore, in the previous line, there's only one possible value for n.

I tested this simple change locally, seems that all tests passed and observed at least 2x speedup.

from time import time

from tqec.computation.cube import ZXCube
from tqec.gallery import steane_encoding

block_graph = None
for i in range(5):
    g = steane_encoding()
    g.relabel_cubes({f"Port{j}": f"Port{j}_{i}" for j in range(7)})
    if block_graph is None:
        block_graph = g
    else:
        block_graph = block_graph.compose(g, f"Port4_{i - 1}", f"Port3_{i}")
    graph_clone = block_graph.clone()
    graph_clone.fill_ports(ZXCube.from_str("ZXZ"))
    start_time = time()
    correlation_surfaces = graph_clone.find_correlation_surfaces()
    print(
        f"Found {len(correlation_surfaces)} correlation surfaces for block graph with {graph_clone.num_cubes}"
        f" cubes, {graph_clone.num_pipes} pipes, and {len(graph_clone.leaf_cubes)}"
        f" leaf cubes took {time() - start_time:.2f} s"
    )

# before
Found 4 correlation surfaces for block graph with 19 cubes, 20 pipes, and 7 leaf cubes took 0.01 s
Found 5 correlation surfaces for block graph with 35 cubes, 40 pipes, and 8 leaf cubes took 0.27 s
Found 6 correlation surfaces for block graph with 51 cubes, 60 pipes, and 9 leaf cubes took 2.55 s
Found 7 correlation surfaces for block graph with 67 cubes, 80 pipes, and 10 leaf cubes took 22.54 s
Found 8 correlation surfaces for block graph with 83 cubes, 100 pipes, and 11 leaf cubes took 151.26 s


# after
Found 4 correlation surfaces for block graph with 19 cubes, 20 pipes, and 7 leaf cubes took 0.01 s
Found 5 correlation surfaces for block graph with 35 cubes, 40 pipes, and 8 leaf cubes took 0.13 s
Found 6 correlation surfaces for block graph with 51 cubes, 60 pipes, and 9 leaf cubes took 1.32 s
Found 7 correlation surfaces for block graph with 67 cubes, 80 pipes, and 10 leaf cubes took 10.13 s
Found 8 correlation surfaces for block graph with 83 cubes, 100 pipes, and 11 leaf cubes took 59.76 s

The scaling is still bad although.. and there are still redundancy but now they are inherent to the symetry of the graph structure. I believe further optimizations exist but they are non-trivial and requires more research.

Sep 22 '25 03:09 Zhaoyilunnn

I tested this simple change locally, seems that all tests passed and observed at least 2x speedup.

Nice work! I didn't notice this redundancy at all.

The scaling is still bad although.. and there are still redundancy but now they are inherent to the symetry of the graph structure. I believe further optimizations exist but they are non-trivial and requires more research.

I have a draft implementation of the algorithm I mentioned above. It currently fails a few tests, so still needs some debugging.

Sep 22 '25 20:09 HaoTy

Related to #696 via topologiq PR tqec/topologiq#17

This work demonstrates algorithmic performance optimization patterns that may be relevant:

Algorithm Optimization Approach:

Baseline establishment: BFS pathfinding as known-good baseline
Alternative algorithm: A* with Manhattan distance heuristic
Statistical validation: 1.78x measured speedup with significance testing
Correctness validation: Both algorithms produce identical results

Performance Research Infrastructure:

Environment-based algorithm switching for A/B testing
Comprehensive benchmarks with parametrized tests
Statistical validation of improvements
Exploration tracking for algorithmic analysis

What Worked:

Starting with simple heuristic (Manhattan distance)
Measuring actual performance, not theoretical complexity
Validating correctness before optimizing further
Making algorithms swappable via environment variables

Patterns Worth Considering: For correlation surface performance work:

Establish baseline with current algorithm
Implement alternative with better complexity
Validate correctness on known test cases
Measure actual performance improvements
Make algorithms swappable for research

The A* optimization demonstrates that even simple heuristics can provide significant practical speedups when applied to the right problem. Happy to discuss algorithmic optimization approaches!

Oct 12 '25 18:10 SMC17

@SMC17 I don’t think your above comments (which are clearly AI-generated) have anything to do with this issue. We’re not against AI-assisted code, but please make sure you understand what you’re doing and carefully review any AI-generated code or text before submitting a PR or commenting on issues. Otherwise, it’s just a waste of everyone’s time.

Oct 13 '25 07:10 inmzhang