open_spiel icon indicating copy to clipboard operation
open_spiel copied to clipboard

Add new game: 2048

Open Jazeem opened this issue 1 year ago • 15 comments

Clone of https://github.com/gabrielecirulli/2048

Jazeem avatar Jul 29 '22 20:07 Jazeem

Very nice, thanks @Jazeem !!

I'm on vacation at the moment but will take a look when I'm back next week.

lanctot avatar Jul 30 '22 10:07 lanctot

Sure @lanctot

I'm curious to see if any algorithms implemented in open_spiel would be able to reach 2048. I tried the default mcts implementation, but it was not faring better than a random player.

Jazeem avatar Jul 30 '22 19:07 Jazeem

Hi @Jazeem,

Hmm.. MCTS performing similarly to random-- even for MCTS with random rollouts-- I'm surprised. I suspect a bug in the game implementation. How many simulations did MCTS have? And how long is the maximum length of each game?

Other algorithms: you could also try some RL algorithms like DQN. Or, expectimax with a heuristic evaluation function.

lanctot avatar Jul 31 '22 09:07 lanctot

Oh wait, depends on what you decided for the reward structure. If there are no intermediate rewards then I can see why it would be super hard for vanilla MCTS but still I would expect it to be better than random.

lanctot avatar Jul 31 '22 13:07 lanctot

@lanctot yeah you are correct. I think it's because there's no intermediate rewards.

When I ran the simulation over 100 games, MCTS and random couldn't win once. But when I reduced the winning condition to tile 128 instead of 2048, MCTS could win 100 games vs 48 games for random.

Jazeem avatar Jul 31 '22 19:07 Jazeem

This is really impressive!

I suspect the terminal condition and reward function still need some tweaking. 2048 doesn't end when the player reaches 2048, it keeps going forever. And the score isn't win/lose, it increases based on the number and size of the merged tiles.

The reason I mention this is that current SOTA algorithms can get scores of ~500k, which IIUC is far beyond reaching 2048.

Disclaimer: I am not an owner of OpenSpiel

steventrouble avatar Aug 02 '22 23:08 steventrouble

Hi @steventrouble

I was experimenting around the terminal condition and reward function. My initial assumption was that setting the game to end at 2048 might make it easier to solve.

I've since then modified the reward system to give a reward for every new tile unlocked. But like you suggested it might be better to keep the game forever like the original and modify Rewards() as score from current action and Returns() as total score.

Jazeem avatar Aug 03 '22 06:08 Jazeem

Hi @steventrouble

I was experimenting around the terminal condition and reward function. My initial assumption was that setting the game to end at 2048 might make it easier to solve.

I recommend we make this as consistent with the original scoring (or with other RL implementations) as possible.

I've since then modified the reward system to give a reward for every new tile unlocked. But like you suggested it might be better to keep the game forever like the original and modify Rewards() as score from current action and Returns() as total score.

I think this should be configurable by a game parameter. All games in OpenSpiel are episodic: the tests assume this, so they should end in finite time (e.g. the game simulation tests can't go on forever) . I recommend you make a flag like "max_steps" or "max_game_length" parameter that plays 2048 for a finite number of steps or a flag that ends the episode after getting 2048 as the default, and then supporting the infinite game as a non-default parameter value.

lanctot avatar Aug 11 '22 10:08 lanctot

Clone of https://github.com/gabrielecirulli/2048

Can you clarify how much of the code in here is based on this implementation? If you actually copied code from that repos, we likely have to put their copyright notice in the file before we can import. (And also it'd be a nice courtesy to tag the authors to let them know, they might have some comments / preferences).

lanctot avatar Aug 11 '22 12:08 lanctot

I think this should be configurable by a game parameter. All games in OpenSpiel are episodic: the tests assume this, so they should end in finite time (e.g. the game simulation tests can't go on forever) . I recommend you make a flag like "max_steps" or "max_game_length" parameter that plays 2048 for a finite number of steps or a flag that ends the episode after getting 2048 as the default, and then supporting the infinite game as a non-default parameter value.

I've modified Returns() and Rewards() to be consistent with the original 2048 scoring. Have also introduced game parameters max_game_length and max_score (Currently have set them to INT_MAX, if this can cause any for any tests, training algorithms I can change this)

Jazeem avatar Aug 12 '22 08:08 Jazeem

Clone of https://github.com/gabrielecirulli/2048 Can you clarify how much of the code in here is based on this implementation? If you actually copied code from that repos, we likely have to put their copyright notice in the file before we can import.

Haven't actually copied any code since the original is written in Javascript and also had to make some changes to fit the LegalActions(), LegalChanceOutcomes(), DoApplyAction() architecture of open_spiel.

But I did use some of the same functions in the original repo and rewrote it in cpp. Those are BuildTraversals(), FindFarthestPosition(), TileMatchesAvailable(), PrepareTiles()

Also DoApplyAction() was written by looking to the move() function in the original repo

Jazeem avatar Aug 12 '22 18:08 Jazeem

(And also it'd be a nice courtesy to tag the authors to let them know, they might have some comments / preferences).

That'd be great!

Jazeem avatar Aug 12 '22 18:08 Jazeem

Hi @Jazeem, something changes on the MacOS behavior via GitHub Actions, can you apply these changes to trigger the tests again: https://github.com/deepmind/open_spiel/pull/907/files

lanctot avatar Aug 17 '22 10:08 lanctot

Yeah it'll take more than just that one line, now there seems to be issues with the Jax versions .. :(

Keep an eye on https://github.com/deepmind/open_spiel/pull/907, might only be resolved tomorrow.

lanctot avatar Aug 17 '22 11:08 lanctot

Keep an eye on https://github.com/deepmind/open_spiel/pull/907, might only be resolved tomorrow.

Sure. Will do 👍

Jazeem avatar Aug 17 '22 11:08 Jazeem

Keep an eye on #907, might only be resolved tomorrow.

Sure. Will do 👍

Ok tests are passing, so you just need to update the Jax and etc. package versions (one line change in install.sh)

lanctot avatar Aug 17 '22 16:08 lanctot

@Jazeem thanks, looking good! Are you done fixing things up for the moment?

I've contacted @tewalds who will review the import, would be good to pull this one in now if you're done making changes.

lanctot avatar Aug 18 '22 08:08 lanctot

@lanctot Yes, done fixing things up for now.

Jazeem avatar Aug 18 '22 09:08 Jazeem

Hi @Jazeem, it's now been merged, but a fair bit changed from internal review. Please take a look through it let us know what you think. You can still do the almost-infinite version by setting max_tile to a very high value.

lanctot avatar Aug 20 '22 11:08 lanctot

@lanctot It looks great 👍

Only thing I noticed is that if a player/agent makes a move that doesn't change anything on the board, then additional chance tiles are not added. So there could be a few inconsequential moves like that, so not sure if those should be counted in

int MaxGameLength() const override { return 2 * 2 * max_tile_; }

Jazeem avatar Aug 22 '22 10:08 Jazeem