mtasa-blue icon indicating copy to clipboard operation
mtasa-blue copied to clipboard

Exceedingly long resource (re)start times

Open sacr1ficez opened this issue 9 months ago • 10 comments

Describe the bug

After PR #3924 server started to behave slow, often throwing long execution warning & causing network trouble on resource (re)starts. It also causes server console to be lagging, or have some kind of input delay. It has never happened before with this resource, and only recently after server update i've started noticing something isn't alright. Test resource i've used has 100 MB overall, and around 426 .lua scripts, but according to others it also happens on default resources, and custom smaller systems (see additional context). What's worse, it seems to be inconsistent, sometimes you can reproduce it right away, and the other time it will randomly appear. Server running time might be a factor.

Steps to reproduce

local loadTimes = {}

function onResourceStart(startedResource)
	local diffResource = (resource ~= startedResource)

	if (not diffResource) then
		return false
	end

	local resourceName = getResourceName(startedResource)
	local savedTime = loadTimes[resourceName]

	if (not savedTime) then
		return false
	end

	local mtaVer = getVersion()
	local mtaVerSortable = mtaVer.sortable
	local timeNow = getTickCount()
	local timeDiff = (timeNow - savedTime).." ms"
	local startLog = resourceName.." took "..timeDiff.." to load @ "..mtaVerSortable

	outputDebugString(startLog)
	loadTimes[resourceName] = nil
end
addEventHandler("onResourceStart", root, onResourceStart)

function onResourceStop(stoppedResource)
	local diffResource = (resource ~= stoppedResource)

	if (not diffResource) then
		return false
	end

	local resourceName = getResourceName(stoppedResource)
	local timeNow = getTickCount()

	loadTimes[resourceName] = timeNow
end
addEventHandler("onResourceStop", root, onResourceStop)

Version

Server: r22883 and above Windows 64 bit server

Additional context

See: https://discord.com/channels/801330706252038164/1289708691830083775/1358180315209797673 https://discord.com/channels/801330706252038164/1289708691830083775/1358181367002959892 (MTA:SA development discord - GNN6PRtTnu)

Relevant log output

[2025-04-08 21:21:35] Stopping dayz
[2025-04-08 21:21:35] Starting dayz
[2025-04-08 21:21:37] INFO: dayz took 2342 ms to load @ 1.6.0-9.22880.0

[2025-04-08 21:26:23] Stopping dayz
[2025-04-08 21:26:23] Starting dayz
[2025-04-08 21:26:30] WARNING: Long execution (dayz)
[2025-04-08 21:26:32] INFO: dayz took 9737 ms to load @ 1.6.0-9.22883.0

Security Policy

  • [x] I have read and understood the Security Policy and this issue is not security related.

sacr1ficez avatar Apr 08 '25 20:04 sacr1ficez

Thanks for providing some performance comparison numbers, they are useful to characterize the problem! I understand that with so many files the filesystem call overhead may start becoming significant.

A month ago, I shared pointer about a potential performance improvement for this situation when asked over Discord:

In theory yes, you could make it slightly more efficient by streaming the file contents in chunks, computing a checksum while copying the file at the same time.

I could give this improvement idea a try, but I'm not familiar enough with C++ to properly decide what the most idiomatic way of doing that in MTA's codebase is, and I don't have that much time to learn at the moment. Any pointers here?

AlexTMjugador avatar Apr 08 '25 21:04 AlexTMjugador

It's difficult to choose between servers that rely on heavy-file-size features like IMG loading (who may now finally be stable), and servers affected in their resource loading times by the "Make server resource file checksumming code memory-efficient" and when i shared this feedback with srslyy and Alex, i heard this interesting idea:

srslyyyy — 23:32 mtaserver.conf setting would be okayish like a toggle this way we won't have to punish one or other side, because ultimately it would be server owner choice to select between one approach or another

Even then, to do it the best way, we shouldn't be choosing, but rather investigate the issue and try to fix it.

One side (e.g servers like srslyyy's) is demanding a revert before anything else, and we also need to advocate for the other side that found stability in the change.

The highlighted part from original PR clarifies the perspective of the other side, "servers that rely on heavy-file-size features like IMG loading (who may now finally be stable)":

Image

Dutchman101 avatar Apr 08 '25 21:04 Dutchman101

AlexTMJugador's opinion on the "mtaserver.conf" approach idea:

Image More specifically, it's about his knownledge of C++ idioms. If anyone is willing to help out (i will link this post to botder and lopsi to begin with), it'll be much welcome.

Dutchman101 avatar Apr 08 '25 21:04 Dutchman101

this is really annoying, every time i want to debug resource, i have to wait 30 secs until the local server respond

for example, starting/restarting editor takes 32 secs

dialogs took 32109 ms to load @ 1.6.0-9.23147.0
tooltip took 32111 ms to load @ 1.6.0-9.23147.0
editor_gui took 32111 ms to load @ 1.6.0-9.23147.0

starting a tool called MSTU takes 32 secs [10:54:00] INFO: mstu took 32401 ms to load @ 1.6.0-9.23147.0

q8X avatar Apr 16 '25 07:04 q8X

well, only solution is to wait i guess. Anyway, if anyone tired of waiting 30s every time you change small part in the code revert to this version and debug locally

so now you have got two versions to test with, latest one and r22880

Image

q8X avatar May 08 '25 13:05 q8X

The issue still persists on Multi Theft Auto v1.6-release-23219 Every restart of my resource was met with long execution & network trouble, it started happening out of nowhere and went away after restarting server.

sacr1ficez avatar May 20 '25 15:05 sacr1ficez

The issue still persists on Multi Theft Auto v1.6-release-23219 Every restart of my resource was met with long execution & network trouble, it started happening out of nowhere and went away after restarting server.

Do you have a local MTA development setup? You could build the server yourself and run a profiler to figure out the hot spot. I can also tell you how to do this.

botder avatar May 20 '25 18:05 botder

I can confirm this issue, it depens on how many files your resource has, sometimes work fine but sometimes takes forever to load. This resource on image (models_loader) contains 2400 files ~ 460mb and server freezed for 2 mins (this was 13 seconds before).

Image

[08:26:20] MTA:SA Server v1.6-release-23223

Update:

  • I'm wrong, not only those resources restart slowly which have lot of files. After one resource bugged, all will be and you need full server restart to fix it.

This has only 3 lua files: Image

Xenius97 avatar May 21 '25 06:05 Xenius97

Can you try to compile and test the server without commit 4929a83de0413a304e5b8e5a34fcf68542c932b6 and see if you can reproduce the issue?

botder avatar May 21 '25 10:05 botder

Can you try to compile and test the server without commit 4929a83 and see if you can reproduce the issue?

I’ve been trying to reproduce it for an hour now, but unfortunately I haven’t succeeded. It doesn’t always occur, it’s completely random.

Xenius97 avatar May 21 '25 11:05 Xenius97