Zihan Ding

Results 15 comments of Zihan Ding

I didn't see the website either after installation.

Why the negative value causes failure in actor loss? You can also refer to OpenAI baselines [here](https://github.com/openai/baselines/tree/master/baselines/ppo1), which has similar process as our repo.

Sorry for the late reply. What you mentioned might be caused by some numerical issues in tf.minimum if I understood correctly. Could you please print out an example case and...

Hi, I've cleaned the code and changed the log. Thanks

Hi, I would expect that the end-to-end training with RLzoo algorithm on RLBench can be hard in practice. As you said, it seems RLBench provides the reward value of either...

Hi guys, I tried to replicate the problem you met, but it doesn't happen from my side. I use PPO-Clip algorithm on *ReachTarget* environment in RLBench and the robot is...

Hi, Did you use the default hyper-parameters provided in RLzoo? If so, we will take a look into this problem.

Hi, It supports dict state, but you need a wrapper for your env. Please take a look at the FlattenDictWrapper (./common/env_wrappers.py) for robotics env.

This problem can be fixed at the ElegantRL side, by change [this line](https://github.com/AI4Finance-Foundation/ElegantRL/blob/4ae8351f88965cc64ee5ac56d80c847e45e8215d/elegantrl/agents/AgentBase.py#L282) to be: ``` traj_list1 = list(map(list, zip(*traj_list))) # state, reward, done, action, noise ``` However, there are...

just replace `states, rewards, masks, actions = [torch.cat(item, dim=0) for item in traj_items]` with `[states, rewards, masks, actions] = traj_list` since the samples are already shaped.