Running exps with Dreamer-V3
Hi guys, first of all what an awesome video you've done on YT! I'm one of the maintainers of sheeprl and I'm here just to tell you that we're running experiments with Dreamer-V3 on the standard env. Right now I have modified your env code inside sheeprl and in the future we want to try out also the v2. This is what I'm getting right now in terms of rewards:
This is the configuration I'm using:
headless: True
save_final_state: True
early_stop: False
action_freq: 24
max_steps: 20480
print_rewards: True
save_video: False
fast_video: True
debug: False
sim_frame_dist: 2_000_000.0
use_screen_explore: True
reward_scale: 4
extra_buttons: False
explore_weight: 3 # 2.5
I don't know if those are good results, but I wanted to share them. If you wanna try out something with SheepRL let us now :sheep:. Thank you again!
Wow! That's pretty cool, although return/reward is really only a tertiary measure of how well the agent does. Also, I haven't been running the Baseline version for quite a while. Would you be able to give the Pufferlib version a whirl so we can compare results? Specifically, we are interested in how far it can get through the game. This is visualized nicely on a weird coldmap (wandb.ai/jsuarez) or heatmap (wandb.ai/xinpw8). Check our current runs' Overviews for the run parameters. Clone https://github.com/PufferAI/Pufferlib (current branch is 0.5) and https://github.com/PufferAI/pokegym (current branch is main). Or grab the Dockerized version, Puffertank. PufferTank is a one-stop-shop for RL tools/framework - Pufferlib is contained therein. It has some really nice features. Anyway, change hyperparameters in config.py, run with kwargs (python demo.py --train --track --env pokemon_red --vectorization multiprocessing) and/or change default run parameters in demo.py, found in pufferlib folder. Environment changes can be made in pokegym/pokegym/environment.py. You'll need the kanto_map_dsv.png map file too, and ofc the pokemon_red.gb rom, both of which go in pufferlib folder. I'll add the map here after work or hop in the discord channel for that.
You'd have to let it run for 11M-20M before you can really tell the status, see experiments here: https://wandb.ai/iron-bound/pufferlib/runs/sjwhhk4r?workspace=user-iron-bound
https://wandb.ai/xinpw8/pufferlib/runs/vkyn6vj6?workspace=user-xinpw8 https://wandb.ai/jsuarez/pufferlib/runs/3ebel57y/workspace?workspace=user-xinpw8