Rui

Results 7 comments of Rui

csxeba is right, A2C and A3C are on-policy methods. Old datas are sampled by old policy, they are clearly not from the same distribution. We usually use a replay buffer...

I get the same problem because the taskpath is not a dict, using the follow codes to fix this: ``` if type(taskpath) == dict: return taskpath['dirname'].split('/')[-1].split('-')[0] ` else: return taskpath.dirname.split('/')[-1].split('-')[0]...

> Got the video part going and it works fine (tested on SMB1), but still no way to save. > > I saw that this is archived, but still, is...

> Hello, > It is given in setup.py that we can use OpenAI gym version >= 0.9.1 and v 0.9.1 works fine for me. > Thanks Thanks!

> Hi! > > I've solved by adding: > > ``` > with torch.autocast("cuda"): > trainer.train() > ``` This solves my problem, thanks!

> https://github.com/YangRui2015/2048_env/blob/2e9b3938492e4f7a0a2b627b8607ad1c203d273a/dqn_agent.py#L96 > > I was new to DQN , and confused with this line . And it should be `q_eval_4_next_state_argmax` instead of `q_eval_4_this_state_argmax` , right ? Hi, this is...

> Thanks for your reply . But according to the DDQN algorithm from [ICML 2016](https://icml.cc/2016/tutorials/deep_rl_tutorial.pdf) , I think the argmax should be evaluated with the next state on the older...