RLzoo icon indicating copy to clipboard operation
RLzoo copied to clipboard

Results using RLBench as the environment

Open mirkomorati opened this issue 4 years ago • 6 comments

Hi, first of all let me say that I appreciate a lot the work made in this repo. I would like to know if you have had success in training any algorithm using RLBench as the environment. I'm currently trying to train the DDPG algorithm on the ReachTarget task using all the observations available with state_type='vision'. As suggested in the issue #6 I modified the default params for DDPG lowering the max_steps and increasing the train_episodes, but I can't seem to get any result. Any feedback is really much appreciated.

Mirko

Edit: I noticed that RLBench doesn't provide "usable" reward metrics, am I wrong? All the episodes rewards are either 0.000 or 1.000. Any insight on this problem?

mirkomorati avatar May 07 '20 14:05 mirkomorati

Hi, I would expect that the end-to-end training with RLzoo algorithm on RLBench can be hard in practice. As you said, it seems RLBench provides the reward value of either 1. or 0. as a signal of task success or not. I wouldn't say it's a not 'usable' reward metrics, it's just too sparse for RL algorithm to learn. So unless you've got a very efficient RL algorithm with some luck in exploration, it may take extremely long time to learn a good policy.

Potential ways of solving that would be starting from a dense reward metric for RLBench I guess, or using reward shaping (e.g. paper here) and other auxiliary techniques.

As for results from our side, ideally we will try to provide some successful policy, but it may take a while.

Zihan

quantumiracle avatar May 07 '20 16:05 quantumiracle

I have run a similar test in rlbench. I found the first 5 episode is normal and the computation is run on GPU as expected. But after that, the computation is extremely slow and the GPU usage is decreased from 30% to almost 0%.

The output in terminal: Episode: 1/100 | Episode Reward: 0.0000 | Running Time: 20.8774 Episode: 2/100 | Episode Reward: 0.0000 | Running Time: 39.9556 Episode: 3/100 | Episode Reward: 0.0000 | Running Time: 70.8135 Episode: 4/100 | Episode Reward: 0.0000 | Running Time: 112.0266 Episode: 5/100 | Episode Reward: 0.0000 | Running Time: 168.1843

I turned on the vrep GUI and found that the robot arm explored around during the first 5 episode and then stop exploring after that...

Any suggestion to debug why the computations on GPU suddenly stopped almost? @quantumiracle

ancorasir avatar May 26 '20 02:05 ancorasir

I have a similar problem using the CPU and around the 7th episode.

mirkomorati avatar May 26 '20 15:05 mirkomorati

Hi guys,

I tried to replicate the problem you met, but it doesn't happen from my side. I use PPO-Clip algorithm on ReachTarget environment in RLBench and the robot is still moving around after 50 episodes without a drop in GPU usage.

The code I used is as follow:

from rlzoo.common.env_wrappers import *
from rlzoo.common.utils import *
from rlzoo.algorithms import *

EnvName = 'ReachTarget'
EnvType = 'rlbench'
env = build_env(EnvName, EnvType, state_type='state')

AlgName = 'PPO'
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
alg = eval(AlgName+'(**alg_params)')
alg.learn(env=env, mode='train', render=True, **learn_params)
alg.learn(env=env, mode='test', render=True, **learn_params)

The package verison:

  • CoppeliaSim==4.0.0
  • PyRep==1.1
  • RLBench==1.0.6
  • tensorflow-gpu==2.0.1
  • Python 3.6

Could you please check your packages version and update if they are not consistent with what I used? If the problem still exists, please specify which algorithm and environment name you are testing.

Thanks

quantumiracle avatar May 27 '20 03:05 quantumiracle

I'm testing the ReachTarget task using the DDPG algorithm. Also I'm using the vision state type. Using only the robot state doesn't produce any performance drop. Also I have tensorflow-gpu==2.1.0 but I'm running on the CPU.

I tried to profile an execution of the training stage for 100 episodes (100 max steps) and this is the result.

Screenshot from 2020-05-29 19-00-31

mirkomorati avatar May 30 '20 15:05 mirkomorati