mage
mage copied to clipboard
Server crashes a lot recently (memory overflow restarts)
The interval varies quite a lot but in my experience, it happens roughly every few minutes to 2 hours.
People in the in-game chat are saying it happens when a near-infinite amount of triggers are created (e.g. storm with way more than 20 triggers) but I have no evidence for this. I do know however that recently the game has been almost unplayable due to the crashes so I'd like to help where I can.
@JayDi85 do you have anything in your logs? Also not sure if this came up already - I couldn't find any issues on this except maybe #10656.
There are only two candidates:
- Commander cards with miss init code (details in #11081). That error can freeze game and run infinite errors logs until server's restart. It's not a 100% of all memory overflow/restarts.
- Transformable cards with too much calls of game cycle:
It's a daybound/nightbound cards like [[Child of the Pack]] that can cause too much calls of
checkStateBasedActions -> transform
in the games. If you put that cards to the battlefield then it will dramatically increase cpu usage andapplyEffects
calls, see attached screenshot from one of the heavy server's game:
Child of the Pack // Savage Packmate - (Gatherer) (Scryfall) (EDHREC)
{2}{R}{G} Creature — Human Werewolf 2/5 {2}{R}{G}: Create a 2/2 green Wolf creature token. Daybound (If a player casts no spells during their own turn, it becomes night next turn.) :arrows_counterclockwise: Creature — Werewolf 5/5 Trample Other creatures you control get +1/+0. Nightbound (If a player casts at least two spells during their own turn, it becomes day next turn.)
+1, the server crashes all of the time. I'd be willing to contribute to a bounty to get this fixed and reduce crashes.
There are added new global features like Prototype mechanics (#11249), so prepare for more unstable times until this whole things stabilises.
There are added new global features like Prototype mechanics (#11249), so prepare for more unstable times until this whole things stabilises.
I don't really see how Prototype could cause a server crash, worst case would be a game crash that doesn't touch the rest of the server, right? Unless the single boolean added to each permanent somehow is too much extra data...
worst case would be a game crash that doesn't touch the rest of the server, right?
Nope. All server restarts related to the memory overflow error: java's garbage collector (GC) can't clean and free objects due circular references between game states or other game objects. So one broken game can eats all server's memory and crash whole server. Under the hood: xmage uses thread's pool to process games and you can't limit memory usage for specific thread/game.
The main problem of memory overflow errors -- it's hard to catch and repeat. It must be a combo of two things:
- bugged code with generating shared objects between game states;
- infinite loop that can generate a large number of calls of the problem code.
Well, whole xmage code base can generates such "shared" objects, but it's ok in 99% due game lifecycle. On game ends it will be closed and all game's memory will be free anyway. But inifinite/freeze game doesn't ends and eats memory by time.
There are possible another use cases like infinite number of objects like AI and mana variations calculation, AI and infinite choose dialogs, infinite number of tokens (no more actual), etc. But it's more easy to catch.
P.S. The main "problem" of new big features -- it adds new batch of possible bad combos or use cases to generate such infinite loops and freezes. So players can find out new steps to reproduce it on the server.
P.P.S. Related topics about memory problem:
- #4520
P.S. The main "problem" of new big features -- it adds new batch of possible bad combos or use cases to generate such infinite loops and freezes. So players can find out new steps to reproduce it on the server.
Right, but Prototype specifically isn't really doing anything new there that Transform isn't already doing. Though it does seem that Transform does have performance issues which, if true, could be made worse by Prototype I suppose. It doesn't involve any game.applyEffects()
calls though so the performance issue mentioned above shouldn't matter.
One of the use case from #9302:
Putting 100s of abilities on the stack at once will crash the server Easiest way to reproduce, add 10-30 [[Sharding Sphinx]] to your board and a [[Platinum Angel]] to the AI so they won't lose the game. When damage is deal this will put a few hundred triggers from the Sharing Sphinx on the board. This will cause the server to lock up a large amount of RAM (20 Sphinxes will lock up ~450MB of ram). This ram can't be CG'ed away until the stack empties.
I'd also like to chime in that this should be the most urgent issue to look into and address. No amount of coding efforts on this project matter if no one can actually enjoy what's been created and maintained all this time, which is currently the case.
Is there any way to identify and prevent some of these cases from occurring? It looks like you have metrics and call stacks available, so figuring out the most frequent causes should be possible, and then prioritizing the fixes from there.
Like @whispy mentioned already, I'm also willing to donate towards the goal of getting the server stabilized.
Load tests can be useful to find problems with the cards or AI. Also it allows to test cards and server's stability.
As example:
- start local server;
- open
LoadTest
file and setup testing cards by:-
TEST_AI_RANDOM_DECK_SETS
-
TEST_AI_CUSTOM_DECK_PATH_1
-
TEST_AI_CUSTOM_DECK_PATH_2
-
- setup games amount to run by
gamesAmount
; - run unit test
test_TwoAIPlayGame_Multiple
- wait final results;
- if you see a problem/slow game, then you can repeat it by manual random sid in
singleGameSID
;
Main memory problem fixed, so server must be more stable. See details here: https://github.com/magefree/mage/issues/9302#issuecomment-1762853919. But there are still todos and check for next updates, e.g. CPU usage on Transform ability (see above).
One of the unknown use case with 90% CPU freeze: Constructed - Legacy | DaddyGreen, Urzasdestiny [quit] | Two Player Duel |
AddCountersSourceEffect and some triggers:
Empty stack and permanents on battlefield (except lands, I'm not sure that it's a moment of high CPU or after it, but it's a problem game 100%):
- [[CityOfTraitors]]
- [[GrimLavamancer]] with one boost counter on it
- [[DragonsRageChanneler]]
- [[DelverOfSecrets]]
- [[ChromeMox]]
City of Traitors - (Gatherer) (Scryfall) (EDHREC)
Land When you play another land, sacrifice City of Traitors. {T}: Add {C}{C}.
Grim Lavamancer - (Gatherer) (Scryfall) (EDHREC)
{R} Creature — Human Wizard 1/1 {R}, {T}, Exile two cards from your graveyard: Grim Lavamancer deals 2 damage to any target.
Dragon's Rage Channeler - (Gatherer) (Scryfall) (EDHREC)
{R} Creature — Human Shaman 1/1 Whenever you cast a noncreature spell, surveil 1. (Look at the top card of your library. You may put that card into your graveyard.) Delirium — As long as there are four or more card types among cards in your graveyard, Dragon's Rage Channeler gets +2/+2, has flying, and attacks each combat if able.
Delver of Secrets // Insectile Aberration - (Gatherer) (Scryfall) (EDHREC)
{U} Creature — Human Wizard 1/1 At the beginning of your upkeep, look at the top card of your library. You may reveal that card. If an instant or sorcery card is revealed this way, transform Delver of Secrets. :arrows_counterclockwise: Creature — Human Insect 3/2 Flying
Chrome Mox - (Gatherer) (Scryfall) (EDHREC)
{0} Artifact Imprint — When Chrome Mox enters the battlefield, you may exile a nonartifact, nonland card from your hand. {T}: Add one mana of any of the exiled card's colors.
Thanks for your deep research and improvements @JayDi85 , great work
Another one possible use case from a problem game:
- [[Rite of Replication]] on the stack
- [[Ancient Greenwarden]] on the battlefield as 160+ tokens and active triggers/effects
- [[Primal Vigor]] on battlefield as 9 active token and counter triggers/effects
- [[Nyxbloom Ancient]]
Rite of Replication - (Gatherer) (Scryfall) (EDHREC)
{2}{U}{U} Sorcery Kicker {5} (You may pay an additional {5} as you cast this spell.) Create a token that's a copy of target creature. If this spell was kicked, create five of those tokens instead.
Ancient Greenwarden - (Gatherer) (Scryfall) (EDHREC)
{4}{G}{G} Creature — Elemental 5/7 Reach You may play lands from your graveyard. If a land entering the battlefield causes a triggered ability of a permanent you control to trigger, that ability triggers an additional time.
Primal Vigor - (Gatherer) (Scryfall) (EDHREC)
{4}{G} Enchantment If one or more tokens would be created, twice that many of those tokens are created instead. If one or more +1/+1 counters would be put on a creature, twice that many +1/+1 counters are put on that creature instead.
Nyxbloom Ancient - (Gatherer) (Scryfall) (EDHREC)
{4}{G}{G}{G} Enchantment Creature — Elemental 5/5 Trample If you tap a permanent for mana, it produces three times as much of that mana instead.
@JayDi85 I could have sworn I saw that "test_TwoAIPlayGame_Multiple" test in the test folder in the past, but now I can't find it.
@jeffwadsworth It ignored by default. Mage.Verify project -> LoadTest.java -> test_TwoAIPlayGame_Multiple
@JayDi85 an user on the Discord reported a crash of the beta server. The reported situation was 500 [[Ramos Dragon Engine]] tokens then triggering them all by casting a multicolored spell.
Ramos, Dragon Engine - (Gatherer) (Scryfall) (EDHREC)
{6} Legendary Artifact Creature — Dragon 4/4 Flying Whenever you cast a spell, put a +1/+1 counter on Ramos, Dragon Engine for each of that spell's colors. Remove five +1/+1 counters from Ramos: Add {W}{W}{U}{U}{B}{B}{R}{R}{G}{G}. Activate only once each turn.
Need more combo examples to crash a server (triggers overflow, etc).
Last crash from that game with high CPU usage: [Not CDEH/ No infinite combo] Martin - azert - B24M - vsdiaz
Don't know the real battlefield situation or steps to reproduce, but CPU "stack" looks very strange cause LookAtTopCardOfLibraryAnyTimeEffect
is ContinuousEffect
-- it applies on every game cycle. BUT it calls a network code to send data to users (fireUpdatePlayersEvent
):
I think it was a workaround to force a user update (in old days reveals/lookat windows may miss on choose dialog until next update/avatar click). So that broken code (fire event) must be removed from all places, not only continues effects:
Just interesting screenshot with good and bad games example:
well, it's can be a real cheater/troll now -- crashed server on main phase of turn 1 (T1M1) with 300 replacement effects and 500 permanents on battlefield
GAME started 5234893d-a6e0-4ed7-baf3-2840f39599cf [] icarus - oranges
Freeze on choose chooseReplacementEffect
with 230 effects to choose:
3 auto-selected choices by user: [[Ancient Greenwarden]], [[Nyxbloom Ancient]] and [[Primal Vigor]]
1 spell on stack: [[Mythos of Illuna]]
~500 permanents on battlefield:
Ancient Greenwarden - (Gatherer) (Scryfall) (EDHREC)
{4}{G}{G} Creature — Elemental 5/7 Reach You may play lands from your graveyard. If a land entering the battlefield causes a triggered ability of a permanent you control to trigger, that ability triggers an additional time.
Nyxbloom Ancient - (Gatherer) (Scryfall) (EDHREC)
{4}{G}{G}{G} Enchantment Creature — Elemental 5/5 Trample If you tap a permanent for mana, it produces three times as much of that mana instead.
Primal Vigor - (Gatherer) (Scryfall) (EDHREC)
{4}{G} Enchantment If one or more tokens would be created, twice that many of those tokens are created instead. If one or more +1/+1 counters would be put on a creature, twice that many +1/+1 counters are put on that creature instead.
Mythos of Illuna - (Gatherer) (Scryfall) (EDHREC)
{2}{U}{U} Sorcery Create a token that's a copy of target permanent. If {R}{G} was spent to cast this spell, instead create a token that's a copy of that permanent, except the token has "When this permanent enters the battlefield, if it's a creature, it fights up to one target creature you don't control."
Yes, it's a real troll, played 1 vs 1 game from same computer, but with two diff clients. From USA, IP address recorded too.
Another use case from a normal game with infinite game freeze on mana calculations -- only 25 permanents on battlefield, but something do an explosive {any} mana growth in available mana calculation.
It must have additional limits for such use cases.
- [x] Another interesting catch: some users was able to run macro feature (it's outdated and disabled feature to repeat same user actions on server side, e.g. combo recording). Fixed by ebaa92c537205acb27f44fa5cfc8dd85ab2345d4
Server stats from last week: ~7 restarts per day
Found two use cases for crash protection tests in old reddit’s topic: https://www.reddit.com/r/XMage/comments/ikroo7/i_love_when_i_get_spectators/
TODO:
- [ ] Must be checked and implemented as unit tests:
1 - tokens:
The main deck is OmniTell, that is: Legacy Show and Tell into Omniscience to cast stuff for free, but I've built a version that basically exists as a gun, and the ammo is whatever is in my sideboard.
So the part that crashes the client is to play Opalescence turning all Enchantments into creatures, then playing a Dual Nature to get tokens of creatures (enchantments) and Doubling Seasons to double the number of tokens. However, when Dual Nature enters the battlefield, Opalescence makes it a creature, so it triggers itself. That means there are two Dual Natures. When I play Doubling Season, two Dual Nature triggers go onto the stack, trying to each place a token copy of Doubling Season. After some number of these resolve the next Dual Nature trigger to resolve places token Doubling Seasons doubled for every other token. This makes it grow very very quickly in number, as it basically creates 2^N + N tokens for each Dual Nature trigger to resolve where N is the number of Doubling Seasons. So for two plays of Doubling Season (which is where the game lags hard and locks up) this is about 2^2059 tokens, each a 5/5 creature token copy of Doubling Season.
2 - triggers:
If one were to use this deck in real life, you can easily keep track of the number of tokens and just put some piece of paper to represent it, right?
Wrong.
I also have included in the deck Grip of Chaos, which will be copied a large number of times. Then if I were to happen to play a spell which targets a creature, ALL of those triggers go onto the stack and EACH one triggers ALL of them. In a tournament, this means you have to call over a judge to resolve them, and they would be very disappointed in me.
3 - more triggers with double season