robosuite-benchmark
robosuite-benchmark copied to clipboard
Trained Rollout Scripts exhibit random behavior
As part of the Visualization of Rollout Scripts, I ran the rollout.py script with different log files such as:
python scripts/rollout.py --load_dir runs/Door-Panda-OSC-POSE-SEED129/Door_Panda_OSC_POSE_SEED129_2020_09_21_20_07_20_0000--s-0/ --horizon 200 --camera frontview
However, for this and all other files that I tested, the robot agent seems to only exhibit random behavior.
From the documentation, I thought these would be well-trained agents. The documentation states:
We provide a rollout script for executing and visualizing rollouts using a trained agent model.
Am I making a mistake or missing something here? Thank you
@rojas70, it's been a while :) but did you figure this out? I'm having the same issue as you, and I'm trying to figure out what's wrong. Here is what I run, I didn't really install robosuite-benchmark nor rlkit :
import torch
import numpy as np
import robosuite
from robosuite.controllers import load_controller_config
from robosuite.wrappers import GymWrapper
has_renderer = True
N_episodes = 1
controller_config = load_controller_config(default_controller="OSC_POSE")
# create environment instance
env_s = robosuite.make(
env_name="Lift",
robots="Panda",
gripper_types="default",
controller_configs=controller_config, # arm is controlled using OSC
has_renderer=has_renderer,
has_offscreen_renderer=False, # no off-screen rendering
control_freq=20, # 20 hz control for applied actions
horizon=500,
use_object_obs=True,
use_camera_obs=False,
reward_shaping=True,
)
# Make sure we only pass in the proprio and object obs (no images)
keys = ["object-state"]
for idx in range(len(env_s.robots)):
keys.append(f"robot{idx}_proprio-state")
# Wrap environment so it's compatible with Gym API
env = GymWrapper(env_s, keys=keys)
data = torch.load('Lift_Panda_OSC_POSE_SEED17_2020_09_13_00_26_56_0000--s-0/params.pkl', map_location=torch.device("cpu"))
policy = data['evaluation/policy']
policy.reset()
for i_episode in range(N_episodes):
obs = env.reset()
ret = 0.
done = False
i = 0
while not done:
action, agent_info = policy.get_action(obs) # use observation to decide on an action
obs, reward, done, info = env.step(action) # take action in the environment
ret += reward
i += 1
if(has_renderer):
env.render() # render on display
print("rollout completed with return {}".format(ret))
env.close()
env_s.close()
Hi @rojas70 @sguysc , were you able to figure out the issue? Thanks.
@ivanangjx, I don't think that I have. I ended up training my own controller with RL. But, also, don't forget you need to set up a specific seed so that the simulation itself will be deterministic. That might be the problem.
np.random.seed(seed)
Hi @sguysc , I followed your code and manually set the seed according to the folder name, but there still exhibit random behavior, do you have any idea? Thanks