clifter_highway

Autonomous driving decision making tasks and environments

Try it online!

multiagent_intersection

Environments

Highway

In this task, the ego vehicle is driving in multilane highway populated with other vehicles. The agent's objective is to drive in high speed on the right side on the road while avoiding collision with other vehicles.

environ = gym.make("highway-v0")

ebv1

Merge

In this task, the ego vehicle approaches a junction with incoming vehicles on the access ramp. The agent's objective is now to maintain a high speed while leaving some space for other vehicles so that the ego vehicle can safely merge in the traffic.

environ = gym.make("merge-v0)

mergeenv1

Roundabout

In this task, the ego vehicle approaches a rounadbout with flowing traffic. It follows its planned route automatically, but also has to handle lane changes and longitudinal control to pass the roundabout as fast as possible while avoiding collisions.

environ = gym.make("roundabout-v0")

roundaboutenv1

Parking

A goal-conditioned continuous control task in which the ego-vehicle must park in a given space with the appropriate heading.

environ = gym.make("parking-v0")

parkingenv1

Intersection

Providing an intersection negotiation task with dense traffic for the ego vehicle.

intersectionenv1

Racetrack

A continuous control task involving lane-keeping and obstacle avoidance

racetrackenv1

Example agents

Deep Q-Network

dqnenv1

This model-free value-based reinforcement learning agent performs Q-learning with function approximation, using a neural network to represent the state-action value function Q.

Deep Deterministic Policy Gradient

ddpgenv1

This model-free policy-based reinforcement learning agent is optimized directly by gradient ascent. It uses Hindsight Experience Replay to efficiently learn how to solve a goal-conditioned task.

deepfastdqnenv2

value Iteration

ttcvi

The Value Iteration is only compatible with finite discrete MDPs, so the environment is first approximated by a finite-mdp environment using env.to_finite_mdp(). This simplified state representation describes the nearby traffic in terms of predicted Time-To-Collision (TTC) on each lane of the road. The transition model is simplistic and assumes that each vehicle will keep driving at a constant speed without changing lanes. This model bias can be a source of mistakes.

The agent then performs a Value Iteration to compute the corresponding optimal state-value function.

Monte-Carlo Tree Search

This agent leverages a transition and reward models to perform a stochastic tree search (Coulom, 2006) of the optimal trajectory. No particular assumption is required on the state representation or transition model.

montelcarolenv

More information:

observation
actions
rewards
multi agent setting

clifter_highway
clifter_highway copied to clipboard

Metadata

clifter_highway

Try it online!

Environments

Example agents

← Metadata

Owner

Metadata

clifter_highway clifter_highway copied to clipboard

Metadata

clifter_highway

Try it online!

Environments

Example agents

← Metadata

Owner

Metadata

clifter_highway
clifter_highway copied to clipboard