[WIP] new feature: runner to launch over multiple virtual environments or multiple guix containers
In this PR I add code meant to simplify running experiments over multiple virtual environment. The main tool for this is nox that I wrap using decorators. You just need to wrap a function with @with_venv decorator to execute it in a separate virtual environment. This could become an alternative to rlberry.experiments with some work.
Example of code:
from rlberry.manager import with_venv, run_xp
@with_venv(import_libs=["numpy", "mushroom_rl"])
def run_mushroom():
"""
Simple script to solve a simple chain with Q-Learning.
"""
import numpy as np
from mushroom_rl.algorithms.value import QLearning
from mushroom_rl.core import Core, Logger
from mushroom_rl.environments import generate_simple_chain
from mushroom_rl.policy import EpsGreedy
from mushroom_rl.utils.parameters import Parameter
from mushroom_rl.utils.dataset import compute_J
np.random.seed()
logger = Logger(QLearning.__name__, results_dir=None)
logger.strong_line()
logger.info('Experiment Algorithm: ' + QLearning.__name__)
# MDP
mdp = generate_simple_chain(state_n=5, goal_states=[2], prob=.8, rew=1,
gamma=.9)
# Policy
epsilon = Parameter(value=.15)
pi = EpsGreedy(epsilon=epsilon)
# Agent
learning_rate = Parameter(value=.2)
algorithm_params = dict(learning_rate=learning_rate)
agent = QLearning(mdp.info, pi, **algorithm_params)
# Core
core = Core(agent, mdp)
# Initial policy Evaluation
dataset = core.evaluate(n_steps=1000)
J = np.mean(compute_J(dataset, mdp.info.gamma))
logger.info(f'J start: {J}')
# Train
core.learn(n_steps=10000, n_steps_per_fit=1)
# Final Policy Evaluation
dataset = core.evaluate(n_steps=1000)
J = np.mean(compute_J(dataset, mdp.info.gamma))
logger.info(f'J final: {J}')
@with_venv(import_libs=["stable-baselines3"], python_ver="3.9")
def run_sb():
import gymnasium as gym
from stable_baselines3 import A2C
env = gym.make("CartPole-v1")
model = A2C("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1_500)
vec_env = model.get_env()
obs = vec_env.reset()
cum_reward = 0
for i in range(1000):
action, _state = model.predict(obs, deterministic=True)
obs, reward, done, info = vec_env.step(action)
cum_reward += reward
print(cum_reward)
if __name__ == "__main__":
run_xp()
The first time this is run, virtual environments are created in the directory in which the script is, subsequent call reuse the virtual environments. The initial environment I am in only needs to have rlberry installed, but not stablebaselines3 or mushroom-rl. For now, I use it to run code from these libraries without trying to interface further with rlberry. The virtual environments can contain different libraries and can be run using different python executables (as in the example).
The result is then
OpenGL.platform.ctypesloader > Loaded libGL.so => libGL.so.1 <CDLL 'libGL.so.1', handle 55f02bae78c0 at 0x7f6343005610>
OpenGL.acceleratesupport > No OpenGL_accelerate module loaded: No module named 'OpenGL_accelerate'
OpenGL.platform.ctypesloader > Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 55f02bafacb0 at 0x7f6340fa9f10>
numexpr.utils > Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
numexpr.utils > NumExpr defaulting to 8 threads.
nox > Running session run_mushroom
nox > Re-using existing virtual environment at $HOME/rlberry/examples/rlberry_venvs/run_mushroom.
nox > python -m pip install numpy
nox > python -m pip install mushroom_rl
nox > python /tmp/tmp2f2yhau5/run_mushroom.py
22/10/2023 14:23:01 [INFO] ###################################################################################################
22/10/2023 14:23:01 [INFO] Experiment Algorithm: QLearning
22/10/2023 14:23:01 [INFO] J start: 1.4276799047757556
22/10/2023 14:23:02 [INFO] J final: 3.044715108158618
nox > Session run_mushroom was successful.
nox > Running session run_sb
nox > Re-using existing virtual environment at $HOME/rlberry/examples/rlberry_venvs/run_sb.
nox > python -m pip install stable-baselines3
nox > python /tmp/tmp2f2yhau5/run_sb.py
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
------------------------------------
| rollout/ | |
| ep_len_mean | 24.3 |
| ep_rew_mean | 24.3 |
| time/ | |
| fps | 372 |
| iterations | 100 |
| time_elapsed | 1 |
| total_timesteps | 500 |
| train/ | |
| entropy_loss | -0.667 |
| explained_variance | -0.114 |
| learning_rate | 0.0007 |
| n_updates | 99 |
| policy_loss | 1.5 |
| value_loss | 7.06 |
------------------------------------
------------------------------------
| rollout/ | |
| ep_len_mean | 33.3 |
| ep_rew_mean | 33.3 |
| time/ | |
| fps | 465 |
| iterations | 200 |
| time_elapsed | 2 |
| total_timesteps | 1000 |
| train/ | |
| entropy_loss | -0.631 |
| explained_variance | 0.0161 |
| learning_rate | 0.0007 |
| n_updates | 199 |
| policy_loss | 1.41 |
| value_loss | 8.66 |
------------------------------------
------------------------------------
| rollout/ | |
| ep_len_mean | 40.3 |
| ep_rew_mean | 40.3 |
| time/ | |
| fps | 507 |
| iterations | 300 |
| time_elapsed | 2 |
| total_timesteps | 1500 |
| train/ | |
| entropy_loss | -0.611 |
| explained_variance | -0.0168 |
| learning_rate | 0.0007 |
| n_updates | 299 |
| policy_loss | -0.437 |
| value_loss | 101 |
------------------------------------
[100.]
nox > Session run_sb was successful.
nox > Ran multiple sessions:
nox > * run_mushroom: success
nox > * run_sb: success
@mmcenta : this is what I had in mind while saying that rlberry could handle virtual environments.
This is a proof of concept, it works, but this is very preliminary work.
Now included: a decorator for running things in a guix container, (guix is like a very powerful package manager). This has the advantage of dumping a channel file (think commit for guix) that can be reused later to reconstruct the container for an almost perfect reproducibility (in particular, guix will take care of keeping the same C-library like cuda and other torch backends).
Example:
from rlberry.manager import with_guix, run_guix_xp
@with_guix(import_libs=["stable-baselines3"])
def run_sb():
import gymnasium as gym
from stable_baselines3 import A2C
env = gym.make("CartPole-v1")
model = A2C("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=1_500)
vec_env = model.get_env()
obs = vec_env.reset()
cum_reward = 0
for i in range(1000):
action, _state = model.predict(obs, deterministic=True)
obs, reward, done, info = vec_env.step(action)
cum_reward += reward
print(cum_reward)
if __name__ == "__main__":
run_guix_xp(keep_build_dir=True)
For now, let us take it one at a time, I removed everything guix and this is only for venv. I will do guix in a separate PR.
Hello, The build of the doc failed. Look up the reason here: https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml
Hello, The build of the doc failed. Look up the reason here: https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml
Hello, The build of the doc failed. Look up the reason here: https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml
Hello, The build of the doc failed. Look up the reason here: https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml
Hello, The build of the doc failed. Look up the reason here: https://github.com/rlberry-py/rlberry/actions/workflows/preview.yml
Hello, The build of the doc succeeded. The documentation preview is available here: https://rlberry-py.github.io/rlberry/preview_pr
Hello, The build of the doc succeeded. The documentation preview is available here: https://rlberry-py.github.io/rlberry/preview_pr
Hello, The build of the doc succeeded. The documentation preview is available here: https://rlberry-py.github.io/rlberry/preview_pr