HighwayEnv icon indicating copy to clipboard operation
HighwayEnv copied to clipboard

SAC + HER for "highway-fast-v0" or "highway-v0" environment application

Open Phd-Ma opened this issue 2 years ago • 1 comments

How to use SAC+HER for "highway-fast-v0" or "highway-v0" environment development my script is that:

import gym import highway_env from gym.wrappers import RecordVideo from stable_baselines3 import HerReplayBuffer, SAC

TRAIN = True

if name == 'main': # Create the environment env = gym.make("highway-fast-v0")

env.configure({
    "action": {
        "type": "ContinuousAction"
    }
})
obs = env.reset()
her_kwargs = dict(n_sampled_goal=4, goal_selection_strategy='future', online_sampling=True, max_episode_length=100)
# Create the model
model = SAC('MultiInputPolicy', env, replay_buffer_class=HerReplayBuffer,
        replay_buffer_kwargs=her_kwargs, verbose=1, buffer_size=int(1e6),
        learning_rate=1e-3,
        gamma=0.95, batch_size=1024, tau=0.05,
        policy_kwargs=dict(net_arch=[512, 512, 512]),
        tensorboard_log="highway_her/")

# Train the model
if TRAIN:
    model.learn(total_timesteps=int(2e3))
    model.save("highway_her/model")
    del model

# Run the trained model and record video

model = SAC.load("highway_her/model", env=env)
env = RecordVideo(env, video_folder="highway_her/videos", episode_trigger=lambda e: True)
env.unwrapped.set_record_video_wrapper(env)
env.configure({"simulation_frequency": 15})  # Higher FPS for rendering
for videos in range(10):
    done = False
    obs = env.reset()
    print ("!!!!!!!!!!!")
    while not done:
        # Predicts
        action, _states = model.predict(obs, deterministic=True)
        # Get reward
        obs, reward, done, insfo = env.step(action)
        # Render
        env.render()
env.close()

while when i exectue this script the error is that : assert isinstance(self.obs_shape, dict), "DictReplayBuffer must be used with Dict obs space only" AssertionError: DictReplayBuffer must be used with Dict obs space only

Can you help do a favor for me!

Phd-Ma avatar Apr 27 '22 11:04 Phd-Ma

Hi, HER is a replay mechanism which is specific to goal based environments (like parking-v0), where the objective is to reach a goal location. This is why it requires a dict observation, to provide both the state and the goal as part of the observation.

But highway-v0 is not goal-based, so no such observation is defined for this environment.

eleurent avatar Apr 30 '22 16:04 eleurent