pyreason-gym
pyreason-gym copied to clipboard
An OpenAI wrapper for PyReason to use in a Grid World reinforcement learning setting
PyReason Gym 🏋
An OpenAI gym wrapper for PyReason to use in a reinforcement learning Grid World setting.

Table of Contents
- Getting Started
- The Setting
- The Actions
- The Objective
- Rewards
- Installation
- Usage
- Actions
- Observations
- Render Modes
- Other Options
- Contributing
- Bibtex
- License
- Contact
Getting Started
This is an OpenAI Gym environment for reinforcement learning in a grid world setting using PyReason as a simulator.
The Setting
- There are two teams: Red and Blue
- There are two bases: Red Base and Blue Base
- There are a certain number of agents in each team
The Actions
There are 9 actions an agent can take:
- Move Up
- Move Down
- Move Left
- Move Right
- Shoot Up
- Shoot Down
- Shoot Left
- Shoot Right
- Do Nothing
The Objective
The objecive of the game is to kill all enemy agents or make their health=0. The game will terminate (or signal done=True when this happens). This objective can be changed in the is_done() function in grid_world.py to determine when the game should be over.
Rewards
The reward function is currently not defined A Reward of 0 is given at each step. You can modify this in the _get_rew function in grid_world.py
Installation
Make sure pyreason==1.5.1 has been installed using the instructions found here
Clone the repository, and install:
git clone https://github.com/lab-v2/pyreason-gym
pip install -e pyreason-gym
NOTE: Do not install this package using setup.py--this will not work. Use the instructions above to install.
Usage
To run the environment and get a feel for things you can run the test.py file which will perform random actions in the grid world for 50 steps.
python test.py
This Grid World scenario needs a graph in GraphML format to run. A graph file has already been generated in the graphs folder. However if you wish to change certain parameters such as
- Number of agents per team
- Start locations of the agents
- Obstacle locations in the grid
- The Grid World size (height, width)
- The locations of the Red and Blue bases
You will need to re-generate the graph file using the generate_graph.py script by changing the appropriate parameters. This will generate the graph in the appropriate location for PyReason to find. NOTE: This is optional if you just want to try out the package--you can use the graph file already provided.
This is an OpenAI Gym custom environment. More on OpenAI Gym:
The interface is just like a normal Gym environment. To create an environment and start using it, insert the following into your Python script. Make sure you've Installed this package before this.
import gym
import pyreason_gym
env = gym.make('PyReasonGridWorld-v0')
# Reset the environment
obs, _ = env.reset()
# Take a random action and get observation, rewards, done signal etc.
# This will sample a random action from the action space of the environment
action = env.action_space.sample()
obs, rew, done, _, _ = env.step(action)
# Keep using `env.step(action)` and `env.reset()` to get observations and run the grid world game.
A Tutorial on how to interact with gym environments can be found here
Actions
The action space is currently a list for each team with discrete numbers representing each action:
- Move Up is represented by
0 - Move Down is represented by
1 - Move Left is represented by
2 - Move Right is represented by
3 - Shoot Up is represented by
4 - Shoot Down is represented by
5 - Shoot Left is represented by
6 - Shoot Right is represented by
7 - Do Nothing is represented by
8
A sample action with 1 agent per team is of the form:
# Sample action. The list will increase with the number of agents per team
action = {
'red_team': [0],
'blue_team': [2]
}
# Send the action to the environment
obs, rew, done, _, _ = env.step(action)
Observations
Observations contain information about each player's position in the grid ([x,y]), their health as well as blue and red bullet information including the position of the bullet in the grid ([x,y]) and its direction.
A sample observation with 1 agent per team is a dictionary of the form:
observation = {
'red_team': [{'pos': [1,3], 'health': [1]}],
'blue_team': [{'pos': [7,2], 'health': [1]}],
'red_bullets': [{'pos': [2,3], 'dir': 1}, {'pos': [5,3], 'dir': 3}],
'blue_bullets': [{'pos': [7,1], 'dir': 2}]
}
Information about agent positions, health, bullet positions and direction can be extracted from this observation space.
Render Modes
There are a few render modes supported:
human- Creates a PyGame visualization of the grid world and actionsNone- No rendering, interaction only through actions and observationsrgb_array- An RGB array of the screen that would have been displayed usingrender_mode='human'. This can be used alongside CNNs etc.
These can be used when creating the environment:
env = gym.make('PyReasonGridWorld-v0', render_mode='human')
# Or
env = gym.make('PyReasonGridWorld-v0', render_mode=None)
# Or
env = gym.make('PyReasonGridWorld-v0', render_mode='rgb_array')
If you're using render_mode='rgb_array you have to call env.render(observation) after observation = env.step() to get the rgb data.
Other Options
If you've generated the graph using the generate_graph.py script with a custom grid size and custom number of agents per team, you can pass these parameters to the grid world while creating the environment:
env = gym.make('PyReasonGridWorld-v0', grid_size=8, num_agents_per_team=1)
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Bibtex
If you used this software in your work please cite our paper
@inproceedings{aditya_pyreason_2023,
title = {{PyReason}: Software for Open World Temporal Logic},
booktitle = {{AAAI} Spring Symposium},
author = {Aditya, Dyuman and Mukherji, Kaustuv and Balasubramanian, Srikar and Chaudhary, Abhiraj and Shakarian, Paulo},
year = {2023}}
License
This repository is licensed under BSD-3-Clause
Contact
Dyuman Aditya - [email protected]
Kaustuv Mukherji - [email protected]
Paulo Shakarian - [email protected]