HighwayEnv
HighwayEnv copied to clipboard
SAC + HER for "highway-fast-v0" or "highway-v0" environment application
How to use SAC+HER for "highway-fast-v0" or "highway-v0" environment development my script is that:
import gym import highway_env from gym.wrappers import RecordVideo from stable_baselines3 import HerReplayBuffer, SAC
TRAIN = True
if name == 'main': # Create the environment env = gym.make("highway-fast-v0")
env.configure({
"action": {
"type": "ContinuousAction"
}
})
obs = env.reset()
her_kwargs = dict(n_sampled_goal=4, goal_selection_strategy='future', online_sampling=True, max_episode_length=100)
# Create the model
model = SAC('MultiInputPolicy', env, replay_buffer_class=HerReplayBuffer,
replay_buffer_kwargs=her_kwargs, verbose=1, buffer_size=int(1e6),
learning_rate=1e-3,
gamma=0.95, batch_size=1024, tau=0.05,
policy_kwargs=dict(net_arch=[512, 512, 512]),
tensorboard_log="highway_her/")
# Train the model
if TRAIN:
model.learn(total_timesteps=int(2e3))
model.save("highway_her/model")
del model
# Run the trained model and record video
model = SAC.load("highway_her/model", env=env)
env = RecordVideo(env, video_folder="highway_her/videos", episode_trigger=lambda e: True)
env.unwrapped.set_record_video_wrapper(env)
env.configure({"simulation_frequency": 15}) # Higher FPS for rendering
for videos in range(10):
done = False
obs = env.reset()
print ("!!!!!!!!!!!")
while not done:
# Predicts
action, _states = model.predict(obs, deterministic=True)
# Get reward
obs, reward, done, insfo = env.step(action)
# Render
env.render()
env.close()
while when i exectue this script the error is that : assert isinstance(self.obs_shape, dict), "DictReplayBuffer must be used with Dict obs space only" AssertionError: DictReplayBuffer must be used with Dict obs space only
Can you help do a favor for me!
Hi, HER is a replay mechanism which is specific to goal based environments (like parking-v0), where the objective is to reach a goal location. This is why it requires a dict observation, to provide both the state and the goal as part of the observation.
But highway-v0 is not goal-based, so no such observation is defined for this environment.