MySQLOO icon indicating copy to clipboard operation
MySQLOO copied to clipboard

[BUG] Server hangs on changelevel

Open blobles-dev opened this issue 1 year ago • 8 comments
trafficstars

My server is suffering from a weird issue where it just hangs when changelevel occurs, no actual crash, just unresponsive. Some research has led me to this module being the cause..

This is related to: #118

Big shoutout to Warden Potato on the gmod discord for making me aware and providing this code to replicate:

local DB = mysqloo.Connect(host, username, password, database)

function DebuggingBusyQuery()
    local q = DB:query("SELECT * FROM `sometable` WHERE `somevalue` = 'SOMETHING' ORDER BY `order` DESC LIMIT " .. math.random(1, 10000))
    function q:onSuccess(rows)
    	--optional print
        --print("Result from busy query")
    end
    function q:onError() print("Errored on debug query") end
    q:start()
end

concommand.Add("robin_williams", function(ply)
	if not ply:IsSuperAdmin() then return end
	timer.Simple(2, function()
		for i = 1, 15000, 1 do
			DebuggingBusyQuery()
		end
		
		RunConsoleCommand("changelevel", "gm_flatgrass")
		print("changelevel has begun")
		
		for i = 1, 8000, 1 do
			DebuggingBusyQuery()
		end
	end)
end)

NOTE: The db should be remote, and the query should be somewhat slow, this ensures issue replication.

A simple workaround is:

hook.Add("ShutDown", "MySQLOO:KillQueriesOnShutdown", function()
	ServerLog("Shutting down a active DB connection and killing all running queries (".. DB:queueSize() ..")\n")

	DB:abortAllQueries()
	DB:disconnect(false)
end)

However, this could result in data loss, plus, you'd have to specifically add an instance for each DB, unless there's a way to grab all DB objects.

blobles-dev avatar Aug 21 '24 22:08 blobles-dev

The database usually waits until all queries are finished, which is usually something you'd want.

Does the problem you mention only occur if queries are launched after the change level has been run? Or is there a particular reason why the busy queries are split into two parts?

FredyH avatar Aug 25 '24 19:08 FredyH

The database usually waits until all queries are finished, which is usually something you'd want.

Does the problem you mention only occur if queries are launched after the change level has been run? Or is there a particular reason why the busy queries are split into two parts?

The code was a quick example by warden potato, its a dirty way of replicating it but at this point im confident a query is being stuck in a "pending" state, basically making the server hang, waiting forever. Since adding the work around code my server has stopped crashing, but upon checking the logs every instance says the queuesize is 0 - so unless Ive just been super lucky and not hit the crash scenario since adding it (unlikely as it was like clockwork) or the query is stuck in some sort of state where its pending but also isnt in the queue.

blobles-dev avatar Aug 25 '24 22:08 blobles-dev

The database usually waits until all queries are finished, which is usually something you'd want.

Does the problem you mention only occur if queries are launched after the change level has been run? Or is there a particular reason why the busy queries are split into two parts?

I've noticed these methods in the mysqloo table: ["allocationCount"] = function: 0xf1baad3a ["deallocationCount"] = function: 0xf1baad62 ["objectCount"] = function: 0xf1baacfa ["referenceCreatedCount"] = function: 0xf1baadf2 ["referenceFreedCount"] = function: 0xf1baadaa

Would these be beneficial to add to my debug prints prior to my work-around so you can get an idea of perhaps where it is stuck?

blobles-dev avatar Aug 28 '24 21:08 blobles-dev

I have added the following code to log some data:

        local data = {}
        data["time"] = os.date("%c")
        data["timestamp"] = os.time()
        data["queueSize"] = Prometheus.DB.Module.DB:queueSize()
        data["preAbort"] = {
            allocationCount = mysqloo.allocationCount(),
            deallocationCount = mysqloo.deallocationCount(),
            objectCount = mysqloo.objectCount(),
            referenceCreatedCount = mysqloo.referenceCreatedCount(),
            referenceFreedCount = mysqloo.referenceFreedCount(),
        }
        
        Prometheus.DB.Module.DB:abortAllQueries()

        data["postAbort"] = {
            allocationCount = mysqloo.allocationCount(),
            deallocationCount = mysqloo.deallocationCount(),
            objectCount = mysqloo.objectCount(),
            referenceCreatedCount = mysqloo.referenceCreatedCount(),
            referenceFreedCount = mysqloo.referenceFreedCount(),
        }
        
        Prometheus.DB.Module.DB:disconnect(false)

        data["postDisconnect"] = {
            allocationCount = mysqloo.allocationCount(),
            deallocationCount = mysqloo.deallocationCount(),
            objectCount = mysqloo.objectCount(),
            referenceCreatedCount = mysqloo.referenceCreatedCount(),
            referenceFreedCount = mysqloo.referenceFreedCount(),
        }

        local jsonString = util.TableToJSON(data, true) 
        local fileName = "prometheus_mysqloo_log.txt"
        if file.Exists(fileName, "DATA") then
            file.Append(fileName, "\n"..jsonString)
        else
            file.Write(fileName, jsonString)
        end

I'll check back in a few days if there's anything that sticks out.

EDIT: Added logging for postDisconnect (just in case this is what actually fixed it, not sure if it was the abort or disconnect atm)

blobles-dev avatar Aug 28 '24 22:08 blobles-dev

@FredyH Attached my log file, the last entry is a suspected crash (i had my workaround enabled so cant confirm but all aligns) I remember when the crashes where occurring I was also getting connection errors from my DB, perhaps its something to do with queries being "stuck" with disconnected DB? Thats also how I simulated this. Had my DB connected, took its perms so it failed, then waited. prometheus_mysqloo_log.txt

blobles-dev avatar Aug 29 '24 21:08 blobles-dev

Yeah so it looks like there are still some queries running. The problem is that queries can lose the TCP connection and not notice that the server is gone, which will keep them that way indefinitely. I added timeout functions that you can use to configure when a query should timeout, but I have not released the update yet. I will probably do so later this week.

FredyH avatar Aug 30 '24 23:08 FredyH

Great news! I saw that commit and figured it probably be related. Two quick questions.. will it call the fail callback and is there a default timeout? It isnt my code thats failing and would be great if I could avoid editing

blobles-dev avatar Aug 31 '24 10:08 blobles-dev

Hi, I released a beta version with the new connection timeouts. Unfortunately, there is not a default value (except for the default the mysql connector supplies), so you will have to set it yourself. However, you could simply override the MySQLOO metatable function for connecting to the database and call the timeout functions beforehand.

FredyH avatar Sep 15 '24 22:09 FredyH

The problem is that queries can lose the TCP connection and not notice that the server is gone, which will keep them that way indefinitely

Yes, this is exactly what i've been experiencing for some years now, server hangs on timeout after attempting to change maps due to a no TCP connection. Although the lost connection for me has always started straight from the previous map change, all the way through to the upcoming map change (which it never gets to do due to the infinite loop you speak of I think). I will try the beta version now with a reasonable timeout, even though that means data loss :/

Zulbem avatar Mar 13 '25 11:03 Zulbem

I will try the beta version now with a reasonable timeout

Please let me know if it worked for you!

FredyH avatar Mar 16 '25 13:03 FredyH

I will try the beta version now with a reasonable timeout

Please let me know if it worked for you!

Okay so it seems to prevent the server from hanging entirely, however it still pops up with timeout message for 5-10 seconds upon a map change before it actually changes. This shouldn't happen if I close the database object before shutdown right?

Zulbem avatar Sep 27 '25 06:09 Zulbem

I will try the beta version now with a reasonable timeout

Please let me know if it worked for you!

Still happening on map change with the beta.

Need anything for debugging? Would love to get this fixed

Velkon avatar Oct 13 '25 20:10 Velkon

Which host are you using? Also, when using the timeouts it should still hang, but after the timeouts it should error and then actually do the level change.

FredyH avatar Oct 13 '25 21:10 FredyH

Physgun, I'm just on default timeouts

Just hangs indefinitely

Linux64

On Mon, Oct 13, 2025, 3:43 PM Frederik Haselmeier @.***> wrote:

FredyH left a comment (FredyH/MySQLOO#138) https://github.com/FredyH/MySQLOO/issues/138#issuecomment-3399151959

Which host are you using? Also, when using the timeouts it should still hang, but after the timeouts it should error and then actually do the level change.

— Reply to this email directly, view it on GitHub https://github.com/FredyH/MySQLOO/issues/138#issuecomment-3399151959, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ4J6AYFQEQUJHQF4JZTID3XQMHVAVCNFSM6AAAAACHUXUHYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGOJZGE2TCOJVHE . You are receiving this because you commented.Message ID: @.***>

Velkon avatar Oct 13 '25 21:10 Velkon

Some more info, I haven't been tracking the success rate of map changes, but players DO say they have noticed an improvement since I've switched over to the beta. So maybe there is still some edge case?

Again if you need anything or want me to set up some debugging, lmk

Velkon avatar Oct 14 '25 17:10 Velkon

I believe the issue is related to Physgun as so far everyone who encountered this issue has been with them. (I suspect their stateful firewall is at fault which loses the connection and then blocks any following packets on the dead connection). I am not sure what more I could do here other than defining the timeouts, which you can fine-tune using the following methods:

Database:setReadTimeout(timeout) -- in seconds, default should be 10s
Database:setWriteTimeout(timeout)
Database:setConnectTimeout(timeout)

FredyH avatar Oct 14 '25 17:10 FredyH

Sounds good, Thanks for all the help and work on the module.

On Tue, Oct 14, 2025, 11:59 AM Frederik Haselmeier @.***> wrote:

FredyH left a comment (FredyH/MySQLOO#138) https://github.com/FredyH/MySQLOO/issues/138#issuecomment-3403018122

I believe the issue is related to Physgun as so far everyone who encountered this issue has been with them. (I suspect their stateful firewall is at fault which loses the connection and then blocks any following packets on the dead connection). I am not sure what more I could do here other than defining the timeouts, which you can fine-tune using the following methods:

Database:setReadTimeout(timeout) -- in seconds, default should be 10sDatabase:setWriteTimeout(timeout)Database:setConnectTimeout(timeout)

— Reply to this email directly, view it on GitHub https://github.com/FredyH/MySQLOO/issues/138#issuecomment-3403018122, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZ4J6ATMJBRMOIXXSK6PYL3XU2YVAVCNFSM6AAAAACHUXUHYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMBTGAYTQMJSGI . You are receiving this because you commented.Message ID: @.***>

Velkon avatar Oct 14 '25 18:10 Velkon