python-pathfinding icon indicating copy to clipboard operation
python-pathfinding copied to clipboard

Potential memory leak in SimpleHeap's removed_node_tuples

Open codelion opened this issue 4 months ago • 3 comments

Hey there, I've been investigating a memory issue in the SimpleHeap implementation and found that removed_node_tuples can grow indefinitely during pathfinding, especially on larger grids.

The problem is that when nodes are "removed" via remove_node(), they get added to removed_node_tuples but are never cleaned up. Over time, this set just keeps growing.

After digging into the code, I'm wondering if removed_node_tuples is actually necessary at all? From what I can see, the A* algorithm already marks processed nodes with the closed flag, so we could potentially simplify things by:

  1. Getting rid of removed_node_tuples completely
  2. Having pop_node() just skip any nodes that are already closed
  3. Making remove_node() do nothing (or removing it entirely)

Something like:

def pop_node(self):
    while self.open_list:
        node_tuple = heapq.heappop(self.open_list)
        # ... get node from tuple ...
        if not node.closed:
            return node
    return None

I tested this approach and it seems to work correctly while eliminating the memory leak. The pathfinding results are identical, but without the growing memory usage.

Is there a specific reason removed_node_tuples was implemented this way, or would you be interested in a PR that simplifies this? I might be missing some edge case that requires the current approach, so wanted to check first.

Thanks!

codelion avatar Aug 15 '25 01:08 codelion

Good find.

As I understand it the set of removed_node_tuples stores the nodes that will be ignored in the future when we iterate over the heap, resulting in a performance boost as we ignore the node in the future (see https://github.com/brean/python-pathfinding/pull/54 )

But you are right, there are better ways to implement his, especially as we already store a opened and closed flag in the node itself, so we should already skip visited nodes (I also like to reduce the amount of data inside the node anyway as it became quite big). Maybe there is already a bug in python pathfindings logic that is not using these flags correctly, but I don't have the time right now to look into that and have to postpone that to next week.

The heap was implemented by @peterchenadded, maybe he can also clear things up.

brean avatar Aug 17 '25 21:08 brean

removed_node_tuples is not there to detect the node is already processed, it's there to track more optimal paths. A nodes f value can reduce and when it does we need to ignore the tuple in the open list with the bigger f value. It is technically the same node though.

I checked my usage of it, in 2d grid based orthogonal pathfinding I saw no impact when removing removed_node_tuples though.

From Wikipedia, I think that isn't the case for all heuristics:

https://en.m.wikipedia.org/wiki/A*_search_algorithm

Remark: In this pseudocode, if a node is reached by one path, removed from openSet, and subsequently reached by a cheaper path, it will be added to openSet again. This is essential to guarantee that the path returned is optimal if the heuristic function is admissible but not consistent. If the heuristic is consistent, when a node is removed from openSet the path to it is guaranteed to be optimal so the test ‘tentative_gScore < gScore[neighbor]’ will always fail if the node is reached again. The pseudocode implemented here is sometimes called the graph-search version of A*.[12] This is in contrast with the version without the ‘tentative_gScore < gScore[neighbor]’ test to add nodes back to openSet, which is sometimes called the tree-search version of A* and require a consistent heuristic to guarantee optimality.

Too many things in removed_node_tuples may indicate a heuristic issue more than anything else.

Hope that helps.

peterchenadded avatar Aug 18 '25 10:08 peterchenadded

I think your right though, remove the removed_node_tuples and just do a simple check to see if the f value from the open set is greater than the current f value of the node. if it is, just ignore it.

peterchenadded avatar Aug 18 '25 10:08 peterchenadded