robot-learning issues

Critic q loss in PPO agent seems to be wrong

In [line 282](https://github.com/youngwoon/robot-learning/blob/master/algorithms/ppo_agent.py#L282) of ppo_agent.py, the critic is trained using: ``` value_loss = self._config.value_loss_coeff * (ret - value_pred).pow(2).mean() ``` where ``` ret ``` is computed as [```ret = adv +...

Asad-Shahid

Merge run and run_episode in `rollout.py`

In `rollout.py`, `run` and `run_episode` share most of the code. We can merge them and replace `run_episode` with `run(every_episodes=1, is_train=False)`.

youngwoon

Verify DAC implementation on MuJoCo benchmarks

youngwoon

Unify actor and critic architectures across model-free algorithms and dreamer

youngwoon

Consistent output dimensions for critic networks

Match the output dimension of critic networks. For example, [128, 1] in PPO and [128] in Dreamer). We can reduce the last dimension.

youngwoon

SAC training

Hello, I was running SAC and noticed that the performance was excellent until around 500,000 steps. After reaching this point, there seemed to be a noticeable downturn. Could you help...

Xiangyu-Lee

robot-learning
robot-learning copied to clipboard

Metadata

Critic q loss in PPO agent seems to be wrong

Merge run and run_episode in `rollout.py`

Verify DAC implementation on MuJoCo benchmarks

Unify actor and critic architectures across model-free algorithms and dreamer

Consistent output dimensions for critic networks

SAC training

← Metadata

Owner

Metadata

robot-learning robot-learning copied to clipboard

Metadata

Critic q loss in PPO agent seems to be wrong

Merge run and run_episode in `rollout.py`

Verify DAC implementation on MuJoCo benchmarks

Unify actor and critic architectures across model-free algorithms and dreamer

Consistent output dimensions for critic networks

SAC training

← Metadata

Owner

Metadata

robot-learning
robot-learning copied to clipboard