rlberry
rlberry copied to clipboard
User guide
I propose we do a user guide for rlberry. The outline of which would be something like this:
- Installation
- Basic Usage
- Quick Start RL
- Quick Start Deep RL
- Set up of an experiment
- Agent Manager, agent, environment.
- Training phase, evaluation phase
- Logging
- Parallelization how to
- Running an experiment
- Train an agent
- Evaluate agents
- Tune hyperparameters
- Plot relevant statistics
- Saving and Loading
- Save and Load of agent
- Save and Load of managers
- Writers
- Save and Load of data for plots
- Make your own agent or environment
- Interaction with Gymnasium
- Using environment from gymnasium
- Using agents from Stablebaselines
- Deep RL agents
- Neural network utils
- Interatctions with torch
- Seeding
- Using Bandits in rlberry
Feel free to suggest any change to this outline. Once we all agree to the outline, we can distribute the work among us.
An I suggest we use rundoc or something similar to verify that the code in the user guide actually does something and have exit code 0.
I think this should go into the long tests because the user guide will contain some code to train agents and it would be too heavy for azure.
An example of a user guide section from pr #276 : https://rlberry--276.org.readthedocs.build/en/276/basics/comparison.html
We can try Jupytext to edit markdown in jupyter.
I'm adding notes concerning Philippe's remarks (check your mailbox):
- The user guide should telling "how rl-berry should used?". Example: experiments should be reproducible, and make sure that all the examples we give are reproducible
- Example of what is a more clearer documentation:
eval([eval_horizon, n_simulations, gamma])'':
Monte-Carlo policy evaluation [1] of an agent to estimate the value at the initial state.''- What do we evaluate? Do we eval the initial state or do we evaluate a policy/trained agent?
- Define the 3 arguments
- How do we seed an agent? call to reseed() or some other way. The description of reseed() is very unclear to me: we provide a sequence of numbers? or one number/seed?
- kwargs should be explained, their attributes listed in all different cases. (See #334)
-
- Regarding the save() method, what does ``Overwrite the 'save' function to manage CPU vs GPU save/load in torch agent'' mean? Does it save the RL-berry agent or just its Q-network? Q-network(s) in the case of DDQN? ... Same thing for load(). Moreover, we don't care that it overloads any other method (See #341). We want to know what it does.
- Include all the arguments in the docstring
- Why is the default value indicated for some arguments and not for all?
- More details about, how evaluate an agent during training
Basically, we should pass on each function/methods, and write the documentation in a better way (if needed), so that everything is documented and explicit.