open_spiel
open_spiel copied to clipboard
Add new game: 2048
Clone of https://github.com/gabrielecirulli/2048
Very nice, thanks @Jazeem !!
I'm on vacation at the moment but will take a look when I'm back next week.
Sure @lanctot
I'm curious to see if any algorithms implemented in open_spiel would be able to reach 2048. I tried the default mcts implementation, but it was not faring better than a random player.
Hi @Jazeem,
Hmm.. MCTS performing similarly to random-- even for MCTS with random rollouts-- I'm surprised. I suspect a bug in the game implementation. How many simulations did MCTS have? And how long is the maximum length of each game?
Other algorithms: you could also try some RL algorithms like DQN. Or, expectimax with a heuristic evaluation function.
Oh wait, depends on what you decided for the reward structure. If there are no intermediate rewards then I can see why it would be super hard for vanilla MCTS but still I would expect it to be better than random.
@lanctot yeah you are correct. I think it's because there's no intermediate rewards.
When I ran the simulation over 100 games, MCTS and random couldn't win once. But when I reduced the winning condition to tile 128 instead of 2048, MCTS could win 100 games vs 48 games for random.
This is really impressive!
I suspect the terminal condition and reward function still need some tweaking. 2048 doesn't end when the player reaches 2048, it keeps going forever. And the score isn't win/lose, it increases based on the number and size of the merged tiles.
The reason I mention this is that current SOTA algorithms can get scores of ~500k, which IIUC is far beyond reaching 2048.
Disclaimer: I am not an owner of OpenSpiel
Hi @steventrouble
I was experimenting around the terminal condition and reward function. My initial assumption was that setting the game to end at 2048 might make it easier to solve.
I've since then modified the reward system to give a reward for every new tile unlocked. But like you suggested it might be better to keep the game forever like the original and modify Rewards() as score from current action and Returns() as total score.
Hi @steventrouble
I was experimenting around the terminal condition and reward function. My initial assumption was that setting the game to end at 2048 might make it easier to solve.
I recommend we make this as consistent with the original scoring (or with other RL implementations) as possible.
I've since then modified the reward system to give a reward for every new tile unlocked. But like you suggested it might be better to keep the game forever like the original and modify Rewards() as score from current action and Returns() as total score.
I think this should be configurable by a game parameter. All games in OpenSpiel are episodic: the tests assume this, so they should end in finite time (e.g. the game simulation tests can't go on forever) . I recommend you make a flag like "max_steps" or "max_game_length" parameter that plays 2048 for a finite number of steps or a flag that ends the episode after getting 2048 as the default, and then supporting the infinite game as a non-default parameter value.
Clone of https://github.com/gabrielecirulli/2048
Can you clarify how much of the code in here is based on this implementation? If you actually copied code from that repos, we likely have to put their copyright notice in the file before we can import. (And also it'd be a nice courtesy to tag the authors to let them know, they might have some comments / preferences).
I think this should be configurable by a game parameter. All games in OpenSpiel are episodic: the tests assume this, so they should end in finite time (e.g. the game simulation tests can't go on forever) . I recommend you make a flag like "max_steps" or "max_game_length" parameter that plays 2048 for a finite number of steps or a flag that ends the episode after getting 2048 as the default, and then supporting the infinite game as a non-default parameter value.
I've modified Returns() and Rewards() to be consistent with the original 2048 scoring. Have also introduced game parameters max_game_length and max_score (Currently have set them to INT_MAX, if this can cause any for any tests, training algorithms I can change this)
Clone of https://github.com/gabrielecirulli/2048 Can you clarify how much of the code in here is based on this implementation? If you actually copied code from that repos, we likely have to put their copyright notice in the file before we can import.
Haven't actually copied any code since the original is written in Javascript and also had to make some changes to fit the LegalActions(), LegalChanceOutcomes(), DoApplyAction() architecture of open_spiel.
But I did use some of the same functions in the original repo and rewrote it in cpp. Those are BuildTraversals(), FindFarthestPosition(), TileMatchesAvailable(), PrepareTiles()
Also DoApplyAction() was written by looking to the move() function in the original repo
(And also it'd be a nice courtesy to tag the authors to let them know, they might have some comments / preferences).
That'd be great!
Hi @Jazeem, something changes on the MacOS behavior via GitHub Actions, can you apply these changes to trigger the tests again: https://github.com/deepmind/open_spiel/pull/907/files
Yeah it'll take more than just that one line, now there seems to be issues with the Jax versions .. :(
Keep an eye on https://github.com/deepmind/open_spiel/pull/907, might only be resolved tomorrow.
Keep an eye on https://github.com/deepmind/open_spiel/pull/907, might only be resolved tomorrow.
Sure. Will do 👍
Keep an eye on #907, might only be resolved tomorrow.
Sure. Will do 👍
Ok tests are passing, so you just need to update the Jax and etc. package versions (one line change in install.sh)
@Jazeem thanks, looking good! Are you done fixing things up for the moment?
I've contacted @tewalds who will review the import, would be good to pull this one in now if you're done making changes.
@lanctot Yes, done fixing things up for now.
Hi @Jazeem, it's now been merged, but a fair bit changed from internal review. Please take a look through it let us know what you think. You can still do the almost-infinite version by setting max_tile to a very high value.
@lanctot It looks great 👍
Only thing I noticed is that if a player/agent makes a move that doesn't change anything on the board, then additional chance tiles are not added. So there could be a few inconsequential moves like that, so not sure if those should be counted in
int MaxGameLength() const override { return 2 * 2 * max_tile_; }