rlberry User guide

I propose we do a user guide for rlberry. The outline of which would be something like this:

Installation
Basic Usage
- Quick Start RL
- Quick Start Deep RL
Set up of an experiment
- Agent Manager, agent, environment.
- Training phase, evaluation phase
- Logging
- Parallelization how to
Running an experiment
- Train an agent
- Evaluate agents
- Tune hyperparameters
- Plot relevant statistics
Saving and Loading
- Save and Load of agent
- Save and Load of managers
- Writers
- Save and Load of data for plots
Make your own agent or environment
- Interaction with Gymnasium
- Using environment from gymnasium
- Using agents from Stablebaselines
- Deep RL agents
  - Neural network utils
  - Interatctions with torch
- Seeding
Using Bandits in rlberry

Feel free to suggest any change to this outline. Once we all agree to the outline, we can distribute the work among us.

Jun 27 '23 13:06 TimotheeMathieu

An I suggest we use rundoc or something similar to verify that the code in the user guide actually does something and have exit code 0.

I think this should go into the long tests because the user guide will contain some code to train agents and it would be too heavy for azure.

Jul 13 '23 07:07 TimotheeMathieu

An example of a user guide section from pr #276 : https://rlberry--276.org.readthedocs.build/en/276/basics/comparison.html

We can try Jupytext to edit markdown in jupyter.

Jul 13 '23 16:07 KohlerHECTOR

I'm adding notes concerning Philippe's remarks (check your mailbox):

The user guide should telling "how rl-berry should used?". Example: experiments should be reproducible, and make sure that all the examples we give are reproducible
Example of what is a more clearer documentation: eval([eval_horizon, n_simulations, gamma])'': Monte-Carlo policy evaluation [1] of an agent to estimate the value at the initial state.''
- What do we evaluate? Do we eval the initial state or do we evaluate a policy/trained agent?
- Define the 3 arguments
How do we seed an agent? call to reseed() or some other way. The description of reseed() is very unclear to me: we provide a sequence of numbers? or one number/seed?
kwargs should be explained, their attributes listed in all different cases. (See #334)
- Regarding the save() method, what does ``Overwrite the 'save' function to manage CPU vs GPU save/load in torch agent'' mean? Does it save the RL-berry agent or just its Q-network? Q-network(s) in the case of DDQN? ... Same thing for load(). Moreover, we don't care that it overloads any other method (See #341). We want to know what it does.
Include all the arguments in the docstring
Why is the default value indicated for some arguments and not for all?
More details about, how evaluate an agent during training

Basically, we should pass on each function/methods, and write the documentation in a better way (if needed), so that everything is documented and explicit.

Jul 21 '23 12:07 riiswa

rlberry rlberry copied to clipboard

User guide

rlberry
rlberry copied to clipboard