metastone
metastone copied to clipboard
[Bug] Game State Value AI will never play Cursed! cards
It will, however, play cards received from Spellslinger. This means it recognizes it has the Cursed! card, it just does not see any benefit to playing it and does not recognize the downside of holding it.
That's actually really odd, considering that it usually plays the card against me. Maybe there's some odd issue with it casting it, like it modifies the board negatively (Troggs?)
Definitely wasn't Troggs, at least for me. To test I made an arbitrary deck with 30 copies of the card, so there was absolutely nothing on my side of the board the AI could have factored into the decision. I also tried 5 or 6 times just to be sure, and even when it had ten copies of Cursed! in hand it refused to play them.
The AI isn't able to see the impact of effects that occur during its following turn. Since Cursed has a TurnStartTrigger effect and nothing else, and since the AI prefers to have more cards in its hand than fewer, it will never choose a future where it plays Cursed.
A possible solution would be to model triggers that happen during future turns (its turn or the opponent's turn) as somehow discounted benefits enjoyed in the current turn.
Specifically: Suppose there was a choice, PlayCardAction: Cursed. If the card has a trigger that can only occur after the end of this current turn (e.g., Secrets, turn start triggers), score a clone of the gamestate as though that trigger had occurred now and multiply the score by some discounting factor, like 0.5, to account for those effects possibly not having due to opponent interactions. Then add this "bonus" to the evaluation of the PlayCardAction: Cursed score.
Note, that for this particular example, the EndTurnAction would also need to be responsible for knowing about the passive trigger specified in Cursed. So EndTurnAction too would need some discounted view of the future in its scoring.
Let's model the AI's choices. Suppose that every HP is worth 1 point. When you evaluate the current game context, you have 30 HP, so 30 points. If you evaluate this same context, but you apply the passive trigger specified in Cursed, you have 28 HP, so 28 points. Let's just consider this difference of -2. Using this procedure, what would be the scores of the game actions evaluated now?
| Action | Score Now | Score Next Turn | Discount | Total |
|---|---|---|---|---|
| PlayCardAction: Cursed | 0 | 0 | 0.5 | 0 |
| EndTurn | 0 | -2 | 0.5 | -1 |
The AI will choose PlayCardAction: Cursed, having evaluated a discounted next turn where nothing has changed besides the application of turn start active and passive triggers.
There's some balance that would have to occur between the current value of having an extra card in the hand versus the future value of having more health. The current heuristic values would need to be retrained to correctly understand this balance and play Cursed when it's in the hand. A training system would learn this model very poorly, because against even many games, the training AI would very rarely encounter a Cursed card. Furthermore, it would be best to look some exponentially-discounted steps into the future, to properly value the Cursed card. A general framework for future effects would need to be written.
These two fixes would make Cursed played and radically improve the AI's ability to play cards that give benefits in the future.