OpenRA Fix actor stuck on unreachable terrain/enclosed immovable actors and cause the game freezing

Fix #20948

The 1st & 2nd commit: fix unit endlessly pathfinding when stuck on enclosed immovable actors

before: pathfinding-lag

after: fix

Note: All 2 tests make stuck units attack a dog reachable if not consider walls.

Need more tests to see if there is any change to regular gameplay, and I need help on that.

The 3rd commit: fix unit endlessly pathfinding when stuck on unreachable terrain

before: before

after: after

Need more tests to see if there is any change to regular gameplay, and I need help on that.

Fixes #21187.

Jul 07 '23 15:07 dnqbob

Still, this PR is hacky and ugly. I hope you guys can have better idea on fix it properly, while performance friendly -- spike lag is still here at the tick when I order more than 80+ units to attack in this PR, MoveWithinRange is a goddamed performance killer but it is beyond this PR, and we need to refactor pathfinding for that.

Jul 07 '23 16:07 dnqbob

Add comments on hacky place

Jul 08 '23 01:07 dnqbob

For better understanding, I will explain those hacky stuff. this is how this PR work: (All Move is Mobile.Move)

Generally, there are 2 conditions that Move finds itself stuck and gives up: (1) When actor is stuck by Building or Immobile actors. (2) When actor cannot not reach destination because of terrain. (stuck by terrain)

Noted that a parent activity needs move from A to A must NOT considered stuck, but Pathfinder has a possibility to return an empty path to Move on this, which is the same as when there is No Path. -- Therefore, we must pick out this specific condition by checking destination and actor's location before Pathfinder, and you can see a hacky stuff called startPointIsDestination.

Then we need a layer to transfer the signal of Move is stuck and gives up. I use Mobile and give it a name called MoveResult. All activities that queue a Move use MoveResult to find if the destination is unreachable and should give up in time, as following steps: -- When those activities are initialized, change MoveResult to default. -- When their child activity Move is initialized, change MoveResult to default. -- After Move is done, check the MoveResult to see if there can be another Move or not.

Jul 08 '23 15:07 dnqbob

Thanks to HPF determining wether a cell is reachable shouldn't be that performance intensive. It should actually be fairly cheap.

I also think this approach has a few major flaws.

Imagine a situation where you give an order for a unit to enter a Naval Transport, but that transport is not at chore yet, it's only moving towards chore. Do we really want to cancel the enter activity in that case? Transports would feel even more cumbersome for the player than they already are.

Or imagine another situation, an aircraft is is circling on the edge of impassible terrain, it's sometimes in range and sometimes unreachable. Do we want to cancel the attack activity in that case?

There are plenty of situations like that where the player will assume that the game is bugged for cancelling, there needs to be a better thought out solution. I'm not sure how exactly it should look like though

Jul 10 '23 11:07 PunkPun

Thanks to HPF determining wether a cell is reachable shouldn't be that performance intensive. It should actually be fairly cheap.

HPF does help on terrain stuck, but MoveWithinRange is so costy that even HPF cannot revert its negative impact (20 actors makes your computer stuck and hot). Consider following use case:

An unit has 6 cells attack range, so MoveWithinRange will search at least 6x6 cells around unit's target, and pick out cells that can be entered and PATHFINDING TO EACH OF THEM. (WTF, OpenRA? why we must do that? RA2/AOE just pick out nearest cell to target that can be entered and let actor move to it, if actor can fire/interact in the half way then it just stops and fires/interact.)
If you attack with a group of units, Move will do 3~4 pathfinding on creation for many units because your actors block with each other.

OK, we have N unit, then the pathfinding times at the moment will be N X 36 X 3. So you can always see a lag spike when you order a group of unit to attack. And you will see a lag wall when your actors is stuck when target has cells can be entered around --after huge pathfinding Move find itself cannot reach the target so give up immediately, but Attack immediately queues a new Move, do the huge pathfinding again in futility.

I don't think even the best pathfinding can help this issue.

Add: If HPF is so cheap on performance, then how do we explain https://github.com/OpenRA/OpenRA/issues/20948 ?

Imagine a situation where you give an order for a unit to enter a Naval Transport, but that transport is not at chore yet, it's only moving towards chore. Do we really want to cancel the enter activity in that case?

Shouldn't when the order on passengers are executed, the transport try to approach the closet shore, and passengers wait until they can go aboard? RA2/Starcraft has taken that.

Add: In those games, if either transport or passengers cannot approach the location, the activity will be stopped and cancelled. In my opinion it is not the problem of the activity stop, it is we need a life improvement on transport control.

an aircraft is is circling on the edge of impassible terrain, it's sometimes in range and sometimes unreachable. Do we want to cancel the attack activity in that case?

My only suggestion is to use a similar MovingWithinRange approach like RA2/AOE that I mentioned above, with small adjustment on if target disappeared/unreachable, goes to the last location instead of stopping and if that aircraft happens in range then stop and attack. Problem solved.

We should never do another set of huge pathfindings for that.

Jul 10 '23 15:07 dnqbob

There are plenty of situations like that where the player will assume that the game is bugged for cancelling, there needs to be a better thought out solution.

Actually cancelling the activity is very common on most of RTS with impassible terrain, see RA2&3/CNC3/AOE/Starcraft1&2, player should not be hard to accept that.

Besides, it is hard to say OpenRA current behavior is what the player expects. Use your example, in OpenRA units are stuck on the ground if you order them to attack aircraft on the impassible terrain, and since then they will do nothing like in idle and has no respond to other incomming threats.

The most user-friendly behavior should be:

The unit should try get as close as possible to the aircraft, not just standing and watching.
When unit cannot get any closer, stops and idle. But it will always attack that aircraft first if the aircraft comes within range with other enemies.

Jul 11 '23 04:07 dnqbob

I think the whole team would be in favour of implementing the 1st. It already works like that for move orders onto impassible terrain, and kinda like that when blockers are actors (not terrain).

As for the 2nd point. Usually when a player clicks on a target he expects the unit to target it, if it retargets on its own it can feel rather jarring. Again I'm not sure what solution should be applied, but the suggested one is far from ideal

Jul 11 '23 08:07 PunkPun

It already works like that for move orders onto impassible terrain.

No. In OpenRA the unit will just stop, instead of get as closer as possible.

Actually, "get as closer as possible" is pretty hard to code and perf unfriendly (It will be as costy or more as MoveWithinRange, you are going to search unknown number of the cells to checkout its reachablity and euclidean distance of the cell to target)

As for the 2nd point. Usually when a player clicks on a target he expects the unit to target it, if it retargets on its own it can feel rather jarring. Again I'm not sure what solution should be applied, but the suggested one is far from ideal

Then this will come to a strange situation: when player ask units to target the aircraft, player could no longer pay attention if the aircraft is destroyed or reachable and turn to handle another issue. When player is back and he see this unit is stuck at closest distination and do nothing, ignoring other enemies, he may just thought there is a bug in OpenRA.

Imagining you are this unit, like playing FPS, if your target is unreachable and your commander give your mission to destroy the aircraft, you should:

Try get as close as possible to the aircraft.
When 1. is done, you will fire to other hostile when they find you and try to stop you from your mission. But when aircraft is coming within reach, you will immediately turn to fire at it.

Jul 12 '23 01:07 dnqbob

This PR will fix #21187

Dec 01 '23 10:12 RoosterDragon

I am not going to push this PR any further because this PR is an attempt to find out where we need to fix if we want to fix https://github.com/OpenRA/OpenRA/issues/21187.

It is too ugly and hacky for me to move on and we don't have clear and certainly expected behaviors on PF failed on all activities affected, which will have a negative impact on the codes that already hacky enough.

Dec 10 '23 00:12 dnqbob

I strongly suggest to make a new PR instead of stick to this PR which makes the code hard to understand and messy.

Dec 10 '23 02:12 dnqbob

Like @RoosterDragon said, I think we should do something to keep the code clearly at new PR.

Make a list on expected behaviors when they cannot reach the destination.
Store the pathing stage code in Move instead of Mobile. We need to rewrite base Activity and make it return a state code instead of a boolean to support this, which will be the most elegant solution.
destinationIsStartpoint is never a good way to solve the problem of reach destination check. We must have a better idea and remove this.

Dec 10 '23 03:12 dnqbob

The activity refactor afaik explicitly tried to remove the returning of information. Reducing the info to booleans making activities much more readable

Dec 10 '23 07:12 PunkPun

In previous consideration of Activity base on a fact that OpenRA only need to know if current (child) activity is finished, but now we need to know how it is finished. The only resonable way afaik to pass this state is to use the returning code, otherwise we have to hardcoded it in Mobile, which is not an ideal solution. We don't have other idea on that.

Dec 10 '23 08:12 dnqbob

I'm not against updating this PR - I think it's already got the right overall approach.

I don't mind that it adds complexity. It seems unavoidable that we need more complex checks here in order to handle "how the move finished" in different ways. So there's nothing wrong with adding things to support that. At least I don't see a way around that.

Dec 10 '23 13:12 RoosterDragon

While scope creep, I think it would be worth to plan on adding an AutoTarget scan every time on queueing movement to the next cell. Currently the scan frequency can cause units to move closer than they should

Jan 04 '24 12:01 PunkPun

closing as superseded by #21391

Jul 01 '24 12:07 PunkPun