robot-learning
robot-learning copied to clipboard
In [line 282](https://github.com/youngwoon/robot-learning/blob/master/algorithms/ppo_agent.py#L282) of ppo_agent.py, the critic is trained using: ``` value_loss = self._config.value_loss_coeff * (ret - value_pred).pow(2).mean() ``` where ``` ret ``` is computed as [```ret = adv +...
In `rollout.py`, `run` and `run_episode` share most of the code. We can merge them and replace `run_episode` with `run(every_episodes=1, is_train=False)`.
Match the output dimension of critic networks. For example, [128, 1] in PPO and [128] in Dreamer). We can reduce the last dimension.
Hello, I was running SAC and noticed that the performance was excellent until around 500,000 steps. After reaching this point, there seemed to be a noticeable downturn. Could you help...