forgottenserver icon indicating copy to clipboard operation
forgottenserver copied to clipboard

Fix astar node memory leak

Open jacksonie opened this issue 7 months ago • 16 comments

Pull Request Prelude

Changes Proposed

Issues addressed:

How to test:

jacksonie avatar May 24 '25 00:05 jacksonie

Won't this indefinitely append nodes to the nodeStore vector? Why not store unique pointers in nodes? It seems that every emplace_back is done on both

ranisalt avatar May 24 '25 09:05 ranisalt

Won't this indefinitely append nodes to the nodeStore vector? Why not store unique pointers in nodes? It seems that every emplace_back is done on both

I think it would be the right option, to avoid the boilerplate. Besides, building a single vector is better than building 2.

MillhioreBT avatar May 24 '25 17:05 MillhioreBT

Won't this indefinitely append nodes to the nodeStore vector? Why not store unique pointers in nodes? It seems that every emplace_back is done on both

I applied a different approach to the code, and in my case, it has been working very well for days now. I know that, ideally, a single vector with unique_ptr would be the best solution, but this is all that my limited time allows me to do right now. Feel free to change whatever you want.

@MillhioreBT

jacksonie avatar May 25 '25 21:05 jacksonie

Isn't real solution deleting all nodes from nodes in AStarNodes destructor?

gesior avatar May 31 '25 14:05 gesior

Isn't real solution deleting all nodes from nodes in AStarNodes destructor?

that's right, it should. as always.

It should be as simple as this, but it's not. image

MillhioreBT avatar Jun 01 '25 18:06 MillhioreBT

You should move the unique_ptr to the nodes vector instead of making a new one.

NRH-AA avatar Jun 01 '25 19:06 NRH-AA

You should move the unique_ptr to the nodes vector instead of making a new one.

You should move the unique_ptr to the nodes vector instead of making a new one.

The vector can have duplicates, so unique_ptr cannot be used since a unique_ptr cannot be copied.
Using shared_ptr would solve it but sounds much worse.

You could also use a temporary std::set to store without duplicates in the destructor, but it looks ugly.

MillhioreBT avatar Jun 01 '25 19:06 MillhioreBT

Then I will dub this a temp fix and I will modify the getBestNodes to handle the unique_ptr properly. Just a quick example is:

std::unique_ptr<AStarNode> AStarNodes::getBestNode()
{
    if (nodes.empty()) {
        return nullptr;
    }

    std::nth_element(nodes.begin(), nodes.end() - 1, nodes.end(),
        [](const std::unique_ptr<AStarNode>& left, const std::unique_ptr<AStarNode>& right) {
            return left->f > right->f;
        });

    std::unique_ptr<AStarNode> best = std::move(nodes.back());
    nodes.pop_back();
    return best; // Caller now owns the node safely
}

I believe this would allow the garbage collector to free the memory when needed. Obviously, I would need to spend more time on it to be sure though.

NRH-AA avatar Jun 01 '25 19:06 NRH-AA

Then I will dub this a temp fix and I will modify the getBestNodes to handle the unique_ptr properly. Just a quick example is:

std::unique_ptr<AStarNode> AStarNodes::getBestNode()
{
    if (nodes.empty()) {
        return nullptr;
    }

    std::nth_element(nodes.begin(), nodes.end() - 1, nodes.end(),
        [](const std::unique_ptr<AStarNode>& left, const std::unique_ptr<AStarNode>& right) {
            return left->f > right->f;
        });

    std::unique_ptr<AStarNode> best = std::move(nodes.back());
    nodes.pop_back();
    return best; // Caller now owns the node safely
}

I believe this would allow the garbage collector to free the memory when needed. Obviously, I would need to spend more time on it to be sure though.

There would still be changes needed for this: neighborNode->parent = n; because in this line, two things can happen:

  1. A raw pointer managed by unique_ptr on vector can be assigned.
  2. A raw pointer that will be destroyed when the unique_ptr is destroyed in the next iteration can be assigned (which would cause a crash).

MillhioreBT avatar Jun 01 '25 20:06 MillhioreBT

/

NRH-AA avatar Jun 01 '25 21:06 NRH-AA

openNodes

openNodes

Parece que el costo de construccion de vectores en cada llamada a getBestNode empeora el rendimiento dramaticamente en comparacion con el mapa de unique_ptr, parece una solucion mas solida la que propone este PR

If you want you can revert the changes you made and follow these changes:

struct AStarNode
{
	AStarNode* parent;
	uint16_t g;
	uint16_t f;
	uint16_t x, y;
	bool closed;
};
~AStarNodes()
{
	for (auto node : nodes) {
		delete node;
	}
}

Modify just under

// Calculate cost of neighbor.
const uint16_t g = n->g + AStarNodes::getMapWalkCost(n, pos) + AStarNodes::getTileWalkCost(creature, tile);
const uint16_t h = calculateHeuristic(pos, targetPos);
const uint16_t newf = h + g;

Replace the code under it with:

if (neighborNode) {
	if (neighborNode->closed) {
		if (neighborNode->f <= newf) {
			continue;
		}

		neighborNode->closed = false;
	} else {
		if (neighborNode->f <= newf) {
			continue;
		}

		neighborNode->g = g;
		neighborNode->f = newf;
		neighborNode->parent = n;
	}
} else {
	nodes.createNewNode(n, pos.x, pos.y, g, newf);
}

Replace getBestNodes()

AStarNode* AStarNodes::getBestNode()
{
	if (nodes.size() == 0) {
		return nullptr;
	}

	std::vector<AStarNode*> openNodes;
	for (auto node : nodes) {
		if (!node->closed) {
			openNodes.push_back(node);
		}
	}

	std::nth_element(openNodes.begin(), openNodes.end() - 1, openNodes.end(),
	                 [](AStarNode* left, AStarNode* right) { return left->f > right->f; });
	AStarNode* retNode = openNodes.back();
	retNode->closed = true;
	return retNode;
}

Then add:

firstNode->closed = false;
newNode->closed = false;

when the first and new nodes are created.

Honestly, I prefer the simple vector of unique_ptr, rather than adding all that new logic that includes allocations of a temporary vector and manual filtering with for range.

If you change the logic of getBestNode to avoid that pop_back, then we could keep using RAII as it has always worked well and optimally.

MillhioreBT avatar Jun 02 '25 01:06 MillhioreBT

openNodes Honestly, I prefer the simple vector of unique_ptr, rather than adding all that new logic that includes allocations of a temporary vector and manual filtering with for range.

If you change the logic of getBestNode to avoid that pop_back, then we could keep using RAII as it has always worked well and optimally.

I don't think this system in general really follows RAII. The nodes are created on the fly not during construction. The main thing here is we deallocate the memory correctly. The unique_ptr is a cleaner method, but I think it is actually not as good of a solution because it is barley used. The extra vector inside the getBestNodes is redundant to performance. While it may not be as pretty I think it is a better solution than throwing in a unique_ptr just for the sake of looks. I guess from this point maybe some other people can throw in their insights. Either solution is fine imo.

NRH-AA avatar Jun 02 '25 04:06 NRH-AA

openNodes Honestly, I prefer the simple vector of unique_ptr, rather than adding all that new logic that includes allocations of a temporary vector and manual filtering with for range. If you change the logic of getBestNode to avoid that pop_back, then we could keep using RAII as it has always worked well and optimally.

I don't think this system in general really follows RAII. The nodes are created on the fly not during construction. The main thing here is we deallocate the memory correctly. The unique_ptr is a cleaner method, but I think it is actually not as good of a solution because it is barley used. The extra vector inside the getBestNodes is redundant to performance. While it may not be as pretty I think it is a better solution than throwing in a unique_ptr just for the sake of looks. I guess from this point maybe some other people can throw in their insights. Either solution is fine imo.

Yes, RAII can be implemented without any issues. The problem lies in the fact that once the nodes are deleted, their management becomes considerably complicated. Additionally, another complication arises: when the vector is reallocated, all pointers contained in the map are invalidated. Consequently, using a vector becomes counterproductive if the design does not contemplate dynamic space reallocation. Currently, the limit is set at 50 elements, but when this threshold is exceeded, the system fails for obvious reasons.

The solution would be to increase the node limit, make sure getBestNode does not remove nodes from the vector, and ensure that the node limit is not exceeded to avoid memory reallocation. For node filtering, ranges can be used: std::views::filter + std::ranges::min_element

MillhioreBT avatar Jun 02 '25 08:06 MillhioreBT

@NRH-AA @MillhioreBT As we are fixing memory leak, but also code style that made this memory leak appear unnoticed in PR. I got a question before it gets merged.

Can this:

	AStarNode firstNode;
	firstNode.parent = nullptr;
	firstNode.x = x;
	firstNode.y = y;
	firstNode.g = 0;
	firstNode.f = 0;

and this:

	AStarNode newNode;
	newNode.parent = parent;
	newNode.x = x;
	newNode.y = y;
	newNode.g = g;
	newNode.f = f;

be moved to AStarNode class constructor with default values for parent, g and f?

Or just set default values in struct, so it will be obvious that any of these values may be not set by constructor [struct initialization] (but it looks like x and y must be set for nodeMap, so constructor with 3 optional parameters [parent, g, f] would be better):

struct AStarNode
{
	AStarNode* parent = nullptr;
	uint16_t g = 0;
	uint16_t f = 0;
	uint16_t x = 0;
	uint16_t y = 0;
};

and just modify values we want to change?


Other question: what are g and f values? Of course I can google AStarNode and search what these values mean in A* algorithm, but can't we name them with something that would tell anyone what these values mean for algorithm? Like, if getting them higher/lower makes them better.

gesior avatar Jun 04 '25 18:06 gesior

@NRH-AA @MillhioreBT As we are fixing memory leak, but also code style that made this memory leak appear unnoticed in PR. I got a question before it gets merged.

Can this:

	AStarNode firstNode;
	firstNode.parent = nullptr;
	firstNode.x = x;
	firstNode.y = y;
	firstNode.g = 0;
	firstNode.f = 0;

and this:

	AStarNode newNode;
	newNode.parent = parent;
	newNode.x = x;
	newNode.y = y;
	newNode.g = g;
	newNode.f = f;

be moved to AStarNode class constructor with default values for parent, g and f?

Or just set default values in struct, so it will be obvious that any of these values may be not set by constructor [struct initialization] (but it looks like x and y must be set for nodeMap, so constructor with 3 optional parameters [parent, g, f] would be better):

struct AStarNode
{
	AStarNode* parent = nullptr;
	uint16_t g = 0;
	uint16_t f = 0;
	uint16_t x = 0;
	uint16_t y = 0;
};

and just modify values we want to change?

Other question: what are g and f values? Of course I can google AStarNode and search what these values mean in A* algorithm, but can't we name them with something that would tell anyone what these values mean for algorithm? Like, if getting them higher/lower makes them better.

Having default values during initialization is a waste of time since we are going to manually change their values during creation anyway.

The fields x and y are important because the current logic requires using these node values.

At least for this PR, that is irrelevant. However, another PR that could trim those two fields from the node struct to save a bit more memory and speed is always welcome.

f and g are just reference values for the node weights and the total sum of weights. They are called that by convention, but they could perfectly well be called cost and total cost. But who cares about this? Anyone working with the A* algorithm will surely easily know what they mean because it is like a standard.

MillhioreBT avatar Jun 04 '25 19:06 MillhioreBT

I benchmarked getPathTo function without "Optimize pathfinding" commit (1 commit before), with "Optimize pathfinding" and with this PR fix.

Summary

  • new algorithm uses 5% less CPU than old
  • it finds 8% less paths to target - fails to find path in some cases in which old algorithm returns path
  • it also returns 30% less steps in found paths - maybe, because it fails on long paths, so 8% extra paths of old algorithm gives 30% more steps

What and how I tested I wrote simple script that calculates path to positions 1-10 steps in each direction for each monster in game (added in Game::checkDecay() this code: https://paste.ots.me/564224/text ).

Tested on Ubuntu 22.04, engine compiled with g++ 13 in Release mode, core affinity set to first 4 'P' cores of Intel CPU (taskset -c 0-3 ./tfs).

Results:

  • old algorithm: 7808 ms (100%)
  • new algorithm with memory leak: 8289 ms (106%)
  • new algorithm without memory leak: 7416 ms (95%)

With memory leak fixed new algorithm is 5% faster than old, but getPathTo results are different. Old algorithm:

Found: 139.091 Not found: 146.029 Total steps: 160.303.989

New algorithm:

Found: 129.553 Not found: 155.567 Total steps: 121.928.695

New algorithm did not find path to target position in +10k tests (8% less than old algorithm). Total number of steps in paths also dropped from 160kk to 120kk. New algorithm find shorter paths to target.. or these 8% of paths for which it could not find path were the longest paths. So we got 8% less paths found, 30% less steps in paths found and 5% less CPU usage. Without generated paths comparison - especially these new algorithm fails to find -, I'm not sure, if new algorithm is really 5% faster or it just gives up earlier by checking less tiles.

EDIT: I increased iterations limit from 120 to 250 to make it find as many paths as old algorithm (139k):

Found: 139.036 Not found: 146.084 Total steps: 160.639.857

Even with custom hash algorithm proposed by @ranisalt execution time grow to 9957 ms, which is 27% more than old algorithm. It looks like new algorithm isn't faster. It just calculates less paths - more often return There is no way. :(

gesior avatar Jun 05 '25 23:06 gesior

Can we finally merge that and remove huge memory leak? We can decide later, if we want to revert 'optimized getPathTo' - as it's slower, if we tune it to return same results as old algorithm.

gesior avatar Jun 19 '25 00:06 gesior

Can we finally merge that and remove huge memory leak? We can decide later, if we want to revert 'optimized getPathTo' - as it's slower, if we tune it to return same results as old algorithm.

@gesior My goal was exactly that: to eliminate the memory leak.

My initial solution, although not perfect, solves the problem, and I've been using it for weeks without any issues.

I know, a second vector called 'toReleaseNodes' may not be ideal, but at least it doesn't leak 70 GB of memory in 20 hours.

jacksonie avatar Jun 19 '25 01:06 jacksonie

Other question: what are g and f values? Of course I can google AStarNode and search what these values mean in A* algorithm, but can't we name them with something that would tell anyone what these values mean for algorithm? Like, if getting them higher/lower makes them better.

g h and f are well known values when working with pathfinding algorithms. It is probably best to leave them as that, but for an explanation: g -> Is the cost so far to get to the node + the cost to walk from the last node to the current node. h -> Is the calculation from the current node to the target/end point. f -> This is the combination of g+h and must be stored in order to decide if a certain path is better than a different one already calculated on the node.

The higher any of the values are the worse the node is in the path. The h value is the main difference between djikstras and A* pathfinding.

The old algorithm used a ton of memory so trying to implement it with the changes I have made for this algorithm will not work well.

You can check this website to get a visual of the difference between the two algorithms. Old pathfinding: Djisktras New: A* (AStar) algorithm

https://qiao.github.io/PathFinding.js/visual/

@gesior My goal was exactly that: to eliminate the memory leak. My initial solution, although not perfect, solves the problem, and I've been using it for weeks without any issues. I know, a second vector called 'toReleaseNodes' may not be ideal, but at least it doesn't leak 70 GB of memory in 20 hours.

Def sounds good and great job on a quick solution that is more than good enough.

I will create a new PR with optimizations to the efficiency of the algorithm.

NRH-AA avatar Jun 20 '25 05:06 NRH-AA

Changes in https://github.com/otland/forgottenserver/pull/4934 are much better.

Even with all optimizations proposed by @ranisalt (struct std::hash<std::pair<uint16_t, uint16_t>>) in this PR, it takes 5843 ms to process all monsters on TFS .otbm. With PR #4934 it's 4350 ms (25% less than this PR and 45% less than 'old TFS algorithm'). There is no memory leak, no crashes and no reports from address sanitizer (I did not test this PR with address sanitizer).

gesior avatar Jun 28 '25 21:06 gesior

@gesior Nice !

alysonjacomin avatar Jun 28 '25 23:06 alysonjacomin

@NRH-AA is this still relevant after #4934 ?

ranisalt avatar Jul 01 '25 21:07 ranisalt

@NRH-AA is this still relevant after #4934 ?

No. Memory leak is fixed after #4934 and we also get 60% CPU usage reduction vs old algorithm. We can close this PR.

gesior avatar Jul 01 '25 22:07 gesior