clifter_highway icon indicating copy to clipboard operation
clifter_highway copied to clipboard

environment decision making in autonomous driving

clifter_highway

license build_badge python pytorch sponsor_button

Autonomous driving decision making tasks and environments

Try it online!

google_colab

multiagent_intersection

Environments

Highway

In this task, the ego vehicle is driving in multilane highway populated with other vehicles. The agent's objective is to drive in high speed on the right side on the road while avoiding collision with other vehicles.

environ = gym.make("highway-v0")

ebv1

Merge

In this task, the ego vehicle approaches a junction with incoming vehicles on the access ramp. The agent's objective is now to maintain a high speed while leaving some space for other vehicles so that the ego vehicle can safely merge in the traffic.

environ = gym.make("merge-v0)

mergeenv1

Roundabout

In this task, the ego vehicle approaches a rounadbout with flowing traffic. It follows its planned route automatically, but also has to handle lane changes and longitudinal control to pass the roundabout as fast as possible while avoiding collisions.

environ = gym.make("roundabout-v0")

roundaboutenv1

Parking

A goal-conditioned continuous control task in which the ego-vehicle must park in a given space with the appropriate heading.

environ = gym.make("parking-v0")

parkingenv1

Intersection

Providing an intersection negotiation task with dense traffic for the ego vehicle.

intersectionenv1

Racetrack

A continuous control task involving lane-keeping and obstacle avoidance

racetrackenv1

Example agents

Deep Q-Network

dqnenv1

This model-free value-based reinforcement learning agent performs Q-learning with function approximation, using a neural network to represent the state-action value function Q.

Deep Deterministic Policy Gradient

ddpgenv1

This model-free policy-based reinforcement learning agent is optimized directly by gradient ascent. It uses Hindsight Experience Replay to efficiently learn how to solve a goal-conditioned task.

deepfastdqnenv2

value Iteration

ttcvi

The Value Iteration is only compatible with finite discrete MDPs, so the environment is first approximated by a finite-mdp environment using env.to_finite_mdp(). This simplified state representation describes the nearby traffic in terms of predicted Time-To-Collision (TTC) on each lane of the road. The transition model is simplistic and assumes that each vehicle will keep driving at a constant speed without changing lanes. This model bias can be a source of mistakes.

The agent then performs a Value Iteration to compute the corresponding optimal state-value function.

Monte-Carlo Tree Search

This agent leverages a transition and reward models to perform a stochastic tree search (Coulom, 2006) of the optimal trajectory. No particular assumption is required on the state representation or transition model.

montelcarolenv

More information:

  • observation
  • actions
  • rewards
  • multi agent setting