CAmod icon indicating copy to clipboard operation
CAmod copied to clipboard

Exploration on bot tick lag-spike (lag causing found)

Open dnqbob opened this issue 3 years ago • 7 comments

Upon days of debugging on lag-spike, I will share some discoveries on the lag-spikes here and welcome if someone can go further and faster than me. I use CA mod as test mod because CA AI is a classic example on massively AI combats which is common in OpenRA game/mod making, and OpenRA RA will also come to this one day.

Test branch.

I push my test branch "lagspike-test" here on testing the CA. In the test branch I add original "GroundState" and "AirState" from OpenRA bleed. So you can switch states in "SquadCA" to see if it is my new states (now in CA) cause the lag-spikes or somewhere else.

I also suggest add two more colors in PerfHistory.Colors at engine code to make it more clearly on watch the performance window.

The best reproducing map on lag spike.

Combat Valley, 8 Players map. Please use 4v4 brutal AIs -- 4 on the left side and 4 on the right side. I don't know which factor in this map make it easy to reproduce the bug.

The lag spikes will happen when attack finally reach deep in AI base, or late game, which is very obviously and lag the game suddenly to 8 FPS.

Discoveries

  1. My changes on "GroundState" is not related to this bug. It is still happening when I switch to original version.

  2. My changes on "AirState" is not related to this bug. It is still happening when I switch to original version.

  3. Protection squad related is not related to this bug. I simply disable the ProtectOwn in SquadManagerBotModuleCA, which means the AI will only update attacker's position and will never use protection squad. Lag-spikes are still happening.

  4. The interval in lag spikes is predictable

lagspike

You can see the lag-spikes is sky high, but it happens in a fixed interval if it happens continuously. I find it is 50 ticks at minimum, which means when lag-spikes condition is not ending, somewhere in bot tick will trigger and lag the game at every 50 ticks. I check and find out those (including AI.yaml settings):

  • HarvesterBotModuleCA.cs: ScanForIdleHarvestersInterval
  • SquadManagerBotModuleCA.cs: AssignRolesInterval

I will do some tests on them.

dnqbob avatar Aug 22 '20 11:08 dnqbob

@Inq8 @darkademic

The scanning of HarvesterBotModuleCA is the causing of heavy lag-spikes.

After I set "ScanForIdleHarvestersInterval" to 800, the lag-spikes almost disappear. Although AI also not building harvesters correctly at this point.

If you also confirm that, I guess you have to check the HarvesterBotModuleCA which seems to have very serious performance issue, especially in the late game at certain map.

dnqbob avatar Aug 22 '20 13:08 dnqbob

@darkademic

The new Harvester Module still cause lag spikes here but more unpredictable (so I am not certain if it is really causing the lag spikes)

dnqbob avatar Aug 29 '20 16:08 dnqbob

The new Harvester Module still cause lag spikes here but more unpredictable (so I am not certain if it is really causing the lag spikes)

Thanks for the info.

It's pretty strange as I've played multiple hundreds of bot games using the trait (in its old form, and a few in updated form) and not noticed lag spikes like that.

I haven't tried on the map you mentioned though, so can investigate more.

The changes to HarvesterBotModule are pretty small, the AI just counts how many harvesters it has, and compares it to the MaxHarvesters config, to determine whether to produce more, whereas in the normal version it just compares to the number of refineries, so it'd be odd if that caused a major change in performance, though I'm not particularly well versed in C# or game programming.

darkademic avatar Aug 30 '20 17:08 darkademic

In fact, me neither on those game program.

I happened to find this map, because at that time, I thought protection squad caused the spike lags so a map could trigger protection actions frequently can be good (by minefield at middle and harvester get frequently attacked) . It was very confusing when I found the protection squad was not the cause.

dnqbob avatar Aug 30 '20 17:08 dnqbob

It is said found actors in world is a high cost measure, but it is useful and vital to any bot module. We can only use as less as we can.

dnqbob avatar Aug 30 '20 17:08 dnqbob

The lagspike is expected in the harvester bot module on specific maps. It shouldn't be CA-specific because it shouldn't come from the scanning but from the pathfinder though which is called during assigning idle harvesters to work.

Nonetheless, I have a minor performance optimization suggestion here:

https://github.com/Inq8/CAmod/blob/1e1f9f146dbdee2996b9211e582f0960ca1b517d/OpenRA.Mods.CA/Traits/BotModules/HarvesterBotModuleCA.cs#L163-L171

which I would rewrite to

var numHarvesters = AIUtils.CountActorByCommonName(Info.HarvesterTypes, player);

if (numHarvesters >= Info.MaxHarvesters)
	return;

var harvCountTooLow = numHarvesters < AIUtils.CountBuildingByCommonName(Info.RefineryTypes, player) * Info.HarvestersPerRefinery;
if (!harvCountTooLow )
	return;

var harvInfo = AIUtils.GetInfoByCommonName(Info.HarvesterTypes, player);
if (unitBuilder.RequestedProductionCount(bot, harvInfo.Name) == 0)
	unitBuilder.RequestUnitProduction(bot, harvInfo.Name);

to avoid the iteration on the harvester info on the GetInfoByCommonName call if the end result isn't even used.

GraionDilach avatar Oct 29 '20 13:10 GraionDilach

Therefore, as you can see I drop this AI module on SP AI. I mean the original one.

dnqbob avatar Oct 29 '20 14:10 dnqbob