forgottenserver icon indicating copy to clipboard operation
forgottenserver copied to clipboard

NPC's have a (Lua) memory leak & crash potential

Open Erza opened this issue 4 years ago • 5 comments

Before creating an issue, please ensure:

  • [x] This is a bug in the software that resides in this repository, and not a support matter (use https://otland.net/forums/support.16/ for support)
  • [x] This issue is reproducible without changes to the C++ code in this repository

Steps to reproduce (include any configuration/script required to reproduce)

  1. Stick this code in a talkaction
local tests = {
    npc = 1,
    monster = 2,
    item = 3,
}

local activeTest = tests.npc

local monsterName = "Enslaved Dwarf"
local npcName = "Banker"
local itemId = 2148

local function runTest(playerPos)
    for i = 1, 5000 do
        if activeTest == tests.npc then
            local npc = Game.createNpc(npcName, playerPos)
            npc:remove()
        elseif activeTest == tests.monster then
            local monster = Game.createMonster(monsterName, playerPos)
            monster:remove()
        elseif activeTest == tests.item then
            local item = Game.createItem(itemId, 100, playerPos)
            item:remove()
        end
    end
end

function onSay(player, words, param)
    local playerPos = player:getPosition()
    
    local start = os.clock()
    runTest(playerPos)
    local reloadStart = os.clock()
    Game.reload(RELOAD_TYPE_TALKACTIONS)
    local testEnd = os.clock()
    print("Test #1: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

    start = os.clock()
    runTest(playerPos)
    reloadStart = os.clock()
    Game.reload(RELOAD_TYPE_TALKACTIONS)
    testEnd = os.clock()
    print("Test #2: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

    start = os.clock()
    runTest(playerPos)
    reloadStart = os.clock()
    Game.reload(RELOAD_TYPE_TALKACTIONS)
    testEnd = os.clock()
    print("Test #3: Test: "..reloadStart - start.." - Reload: "..testEnd - reloadStart.." - Total: "..testEnd - start)

    playerPos:sendMagicEffect(CONST_ME_BLOCKHIT)
    print(("-"):rep(50))
    return false
end
  1. Run the talkaction a few times and keep an eye on the memory usage of the process

Expected behaviour

NPC's cleanly deallocate their memory upon being removed

Actual behaviour

NPC's do not deallocate their memory upon being removed

Observations

  1. After commenting out this call to lua_gc, reload times stay consistent, which gives evidence of the Lua environment getting trashed https://github.com/otland/forgottenserver/blob/e1e1bc21a72034cdc57cfbc95701182943307471/src/luascript.cpp#L4604
  2. Using local activeTest = tests.monster or local activeTest = tests.item does not produce the same memory leak
  3. It doesn't matter which reload type we use, whether it's RELOAD_TYPE_TALKACTIONS, RELOAD_TYPE_CREATURESCRIPTS or RELOAD_TYPE_ACTIONS etc, the entire Lua environment is affected

Test Outputs

With the call to lua_gc enabled (increasing timings)
Test #1: Test: 3.841 - Reload: 0.059000000000001 - Total: 3.9
Test #2: Test: 3.913 - Reload: 0.11 - Total: 4.023
Test #3: Test: 3.833 - Reload: 0.167 - Total: 4
--------------------------------------------------
Test #1: Test: 3.832 - Reload: 0.221 - Total: 4.053
Test #2: Test: 3.841 - Reload: 0.274 - Total: 4.115
Test #3: Test: 3.896 - Reload: 0.33 - Total: 4.226
--------------------------------------------------
Test #1: Test: 3.897 - Reload: 0.383 - Total: 4.28
Test #2: Test: 3.916 - Reload: 0.439 - Total: 4.355
Test #3: Test: 3.934 - Reload: 0.492 - Total: 4.426
--------------------------------------------------
Test #1: Test: 3.954 - Reload: 0.547 - Total: 4.501
Test #2: Test: 3.971 - Reload: 0.594 - Total: 4.565
Test #3: Test: 3.988 - Reload: 0.657 - Total: 4.645
--------------------------------------------------
Test #1: Test: 4.006 - Reload: 0.706 - Total: 4.712
Test #2: Test: 3.993 - Reload: 0.76400000000001 - Total: 4.757
Test #3: Test: 4.019 - Reload: 0.815 - Total: 4.834
--------------------------------------------------
Test #1: Test: 4.017 - Reload: 0.87 - Total: 4.887
Test #2: Test: 4.017 - Reload: 0.929 - Total: 4.946
Test #3: Test: 4.061 - Reload: 0.982 - Total: 5.043
--------------------------------------------------
With the call to lua_gc disabled (consistent timings)
Test #1: Test: 3.875 - Reload: 0.0040000000000013 - Total: 3.879
Test #2: Test: 3.921 - Reload: 0.0040000000000013 - Total: 3.925
Test #3: Test: 3.911 - Reload: 0.0040000000000013 - Total: 3.915
--------------------------------------------------
Test #1: Test: 4.046 - Reload: 0.0030000000000001 - Total: 4.049
Test #2: Test: 3.844 - Reload: 0.0040000000000013 - Total: 3.848
Test #3: Test: 3.967 - Reload: 0.0039999999999978 - Total: 3.971
--------------------------------------------------
Test #1: Test: 4.112 - Reload: 0.0049999999999955 - Total: 4.117
Test #2: Test: 4.124 - Reload: 0.0039999999999978 - Total: 4.128
Test #3: Test: 3.94 - Reload: 0.0039999999999978 - Total: 3.944
--------------------------------------------------
Test #1: Test: 3.974 - Reload: 0.0040000000000049 - Total: 3.978
Test #2: Test: 4.007 - Reload: 0.0039999999999978 - Total: 4.011
Test #3: Test: 4.217 - Reload: 0.0049999999999955 - Total: 4.222
--------------------------------------------------
Test #1: Test: 4.528 - Reload: 0.0049999999999955 - Total: 4.533
Test #2: Test: 4.186 - Reload: 0.0039999999999907 - Total: 4.19
Test #3: Test: 4.046 - Reload: 0.0030000000000001 - Total: 4.049
--------------------------------------------------
Test #1: Test: 4.108 - Reload: 0.0039999999999907 - Total: 4.112
Test #2: Test: 4.33 - Reload: 0.0040000000000049 - Total: 4.334
[Warning - NpcScript::NpcScript] Can not load script: bank.lua
not enough memory

Memory Graphs

NPC Test

NPC Test

Monster Test

Monster Test

Item Test

Item Test

Impact on the server

Eventually, the Lua environment will run out of available memory to allocate and you will get the following error, after which the server will shortly crash

[Warning - NpcScript::NpcScript] Can not load script: bank.lua
not enough memory

Erza avatar Aug 11 '21 09:08 Erza

Probably the npc system is at fault (I think it's known from having memory leaks), if you empty the lua files to bare minimum is it still leaking? Or if you use another npc.

nekiro avatar Aug 11 '21 11:08 nekiro

For all of these tests I was using 33333 iterations per test, to get to a total of 100k spawns

local function runTest(playerPos)
    for i = 1, 33333 do
        if activeTest == tests.npc then
            local npc = Game.createNpc(npcName, playerPos)
            npc:remove()
        elseif activeTest == tests.monster then
            local monster = Game.createMonster(monsterName, playerPos)
            monster:remove()
        elseif activeTest == tests.item then
            local item = Game.createItem(itemId, 100, playerPos)
            item:remove()
        end
    end
end

Results

The first dip is always after the talkaction finished executing, the 2nd dip is always after manually running /reload npc after execution finishes

Test 1 (Removing the script="bank.lua" tag from the Banker.xml file entirely

Looks like we got perfect memory deallocation in this case

Test 2 (Bare minimum NPC script file)

It's seemingly less with an "empty" NPC script file, but I can still observe a leak coming from data/npc/lib/npc.lua (more specifically, data/npc/lib/npcsystem.lua)

Test 3 (Adding KeywordHandler)

Test 4 (Adding NpcHandler)

Test 5 (Parsing parameters & enabling NpcHandler callbacks)

Test 6 (Adding FocusModule)

Test 7 (Adding a few keywords to listen for)

Erza avatar Aug 11 '21 16:08 Erza

There's some exception with code 0xE24C4A02 spamming the log when running a debugger it's being launched from this line (and has npc onThink in stacktrace): https://github.com/otland/forgottenserver/blob/0e20d54eebc6be8ea7ed50d0fd3d31d5d43df1f9/src/luascript.cpp#L300

could someone confirm this with unmodded tfs?

Zbizu avatar Feb 04 '23 02:02 Zbizu

https://github.com/otland/forgottenserver/pull/4483

EPuncker avatar Jun 29 '23 21:06 EPuncker