rlberry [WIP] new feature: runner to launch over multiple virtual environments or multiple guix containers

In this PR I add code meant to simplify running experiments over multiple virtual environment. The main tool for this is nox that I wrap using decorators. You just need to wrap a function with @with_venv decorator to execute it in a separate virtual environment. This could become an alternative to rlberry.experiments with some work.

Example of code:

from rlberry.manager import with_venv, run_xp

@with_venv(import_libs=["numpy", "mushroom_rl"])
def run_mushroom():
    """
    Simple script to solve a simple chain with Q-Learning.

    """
    import numpy as np

    from mushroom_rl.algorithms.value import QLearning
    from mushroom_rl.core import Core, Logger
    from mushroom_rl.environments import generate_simple_chain
    from mushroom_rl.policy import EpsGreedy
    from mushroom_rl.utils.parameters import Parameter

    from mushroom_rl.utils.dataset import compute_J

    np.random.seed()

    logger = Logger(QLearning.__name__, results_dir=None)
    logger.strong_line()
    logger.info('Experiment Algorithm: ' + QLearning.__name__)

    # MDP
    mdp = generate_simple_chain(state_n=5, goal_states=[2], prob=.8, rew=1,
                                gamma=.9)

    # Policy
    epsilon = Parameter(value=.15)
    pi = EpsGreedy(epsilon=epsilon)

    # Agent
    learning_rate = Parameter(value=.2)
    algorithm_params = dict(learning_rate=learning_rate)
    agent = QLearning(mdp.info, pi, **algorithm_params)

    # Core
    core = Core(agent, mdp)

    # Initial policy Evaluation
    dataset = core.evaluate(n_steps=1000)
    J = np.mean(compute_J(dataset, mdp.info.gamma))
    logger.info(f'J start: {J}')

    # Train
    core.learn(n_steps=10000, n_steps_per_fit=1)

    # Final Policy Evaluation
    dataset = core.evaluate(n_steps=1000)
    J = np.mean(compute_J(dataset, mdp.info.gamma))
    logger.info(f'J final: {J}')


@with_venv(import_libs=["stable-baselines3"], python_ver="3.9")
def run_sb():
    import gymnasium as gym

    from stable_baselines3 import A2C

    env = gym.make("CartPole-v1")

    model = A2C("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=1_500)

    vec_env = model.get_env()
    obs = vec_env.reset()
    cum_reward = 0
    for i in range(1000):
        action, _state = model.predict(obs, deterministic=True)
        obs, reward, done, info = vec_env.step(action)
        cum_reward += reward
    print(cum_reward)
        
if __name__ == "__main__":
    run_xp()

The first time this is run, virtual environments are created in the directory in which the script is, subsequent call reuse the virtual environments. The initial environment I am in only needs to have rlberry installed, but not stablebaselines3 or mushroom-rl. For now, I use it to run code from these libraries without trying to interface further with rlberry. The virtual environments can contain different libraries and can be run using different python executables (as in the example).

The result is then

OpenGL.platform.ctypesloader > Loaded libGL.so => libGL.so.1 <CDLL 'libGL.so.1', handle 55f02bae78c0 at 0x7f6343005610>
OpenGL.acceleratesupport > No OpenGL_accelerate module loaded: No module named 'OpenGL_accelerate'
OpenGL.platform.ctypesloader > Loaded libGLU.so => libGLU.so.1 <CDLL 'libGLU.so.1', handle 55f02bafacb0 at 0x7f6340fa9f10>
numexpr.utils > Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
numexpr.utils > NumExpr defaulting to 8 threads.
nox > Running session run_mushroom
nox > Re-using existing virtual environment at $HOME/rlberry/examples/rlberry_venvs/run_mushroom.
nox > python -m pip install numpy
nox > python -m pip install mushroom_rl
nox > python /tmp/tmp2f2yhau5/run_mushroom.py
22/10/2023 14:23:01 [INFO] ###################################################################################################
22/10/2023 14:23:01 [INFO] Experiment Algorithm: QLearning
22/10/2023 14:23:01 [INFO] J start: 1.4276799047757556                                                    
22/10/2023 14:23:02 [INFO] J final: 3.044715108158618                                                     
nox > Session run_mushroom was successful.
nox > Running session run_sb
nox > Re-using existing virtual environment at $HOME/rlberry/examples/rlberry_venvs/run_sb.
nox > python -m pip install stable-baselines3
nox > python /tmp/tmp2f2yhau5/run_sb.py
Using cuda device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 24.3     |
|    ep_rew_mean        | 24.3     |
| time/                 |          |
|    fps                | 372      |
|    iterations         | 100      |
|    time_elapsed       | 1        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -0.667   |
|    explained_variance | -0.114   |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | 1.5      |
|    value_loss         | 7.06     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 33.3     |
|    ep_rew_mean        | 33.3     |
| time/                 |          |
|    fps                | 465      |
|    iterations         | 200      |
|    time_elapsed       | 2        |
|    total_timesteps    | 1000     |
| train/                |          |
|    entropy_loss       | -0.631   |
|    explained_variance | 0.0161   |
|    learning_rate      | 0.0007   |
|    n_updates          | 199      |
|    policy_loss        | 1.41     |
|    value_loss         | 8.66     |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 40.3     |
|    ep_rew_mean        | 40.3     |
| time/                 |          |
|    fps                | 507      |
|    iterations         | 300      |
|    time_elapsed       | 2        |
|    total_timesteps    | 1500     |
| train/                |          |
|    entropy_loss       | -0.611   |
|    explained_variance | -0.0168  |
|    learning_rate      | 0.0007   |
|    n_updates          | 299      |
|    policy_loss        | -0.437   |
|    value_loss         | 101      |
------------------------------------
[100.]
nox > Session run_sb was successful.
nox > Ran multiple sessions:
nox > * run_mushroom: success
nox > * run_sb: success

@mmcenta : this is what I had in mind while saying that rlberry could handle virtual environments.

This is a proof of concept, it works, but this is very preliminary work.

Oct 22 '23 12:10 TimotheeMathieu

Now included: a decorator for running things in a guix container, (guix is like a very powerful package manager). This has the advantage of dumping a channel file (think commit for guix) that can be reused later to reconstruct the container for an almost perfect reproducibility (in particular, guix will take care of keeping the same C-library like cuda and other torch backends).

Example:

from rlberry.manager import with_guix, run_guix_xp

@with_guix(import_libs=["stable-baselines3"])
def run_sb():
    import gymnasium as gym

    from stable_baselines3 import A2C

    env = gym.make("CartPole-v1")

    model = A2C("MlpPolicy", env, verbose=1)
    model.learn(total_timesteps=1_500)

    vec_env = model.get_env()
    obs = vec_env.reset()
    cum_reward = 0
    for i in range(1000):
        action, _state = model.predict(obs, deterministic=True)
        obs, reward, done, info = vec_env.step(action)
        cum_reward += reward
    print(cum_reward)


if __name__ == "__main__":
    run_guix_xp(keep_build_dir=True)

Nov 16 '23 16:11 TimotheeMathieu

For now, let us take it one at a time, I removed everything guix and this is only for venv. I will do guix in a separate PR.

Jun 05 '24 15:06 TimotheeMathieu