Rui
Rui
csxeba is right, A2C and A3C are on-policy methods. Old datas are sampled by old policy, they are clearly not from the same distribution. We usually use a replay buffer...
I get the same problem because the taskpath is not a dict, using the follow codes to fix this: ``` if type(taskpath) == dict: return taskpath['dirname'].split('/')[-1].split('-')[0] ` else: return taskpath.dirname.split('/')[-1].split('-')[0]...
> Got the video part going and it works fine (tested on SMB1), but still no way to save. > > I saw that this is archived, but still, is...
> Hello, > It is given in setup.py that we can use OpenAI gym version >= 0.9.1 and v 0.9.1 works fine for me. > Thanks Thanks!
> Hi! > > I've solved by adding: > > ``` > with torch.autocast("cuda"): > trainer.train() > ``` This solves my problem, thanks!
> https://github.com/YangRui2015/2048_env/blob/2e9b3938492e4f7a0a2b627b8607ad1c203d273a/dqn_agent.py#L96 > > I was new to DQN , and confused with this line . And it should be `q_eval_4_next_state_argmax` instead of `q_eval_4_this_state_argmax` , right ? Hi, this is...
> Thanks for your reply . But according to the DDQN algorithm from [ICML 2016](https://icml.cc/2016/tutorials/deep_rl_tutorial.pdf) , I think the argmax should be evaluated with the next state on the older...