rlpyt
rlpyt copied to clipboard
Documentation now available (questions about *documentation* here)
Hi!
Documentation is now available!
https://rlpyt.readthedocs.io
Feel free to post in this issue for minor clarifications / comments, or start a new issue if it's something bigger.
Hope this helps!!
Thanks for the heavy work!
Thanks @astooke Just a quick comment, the Gym and Environments data doesn't seem to be available on the website. For example I see documentation here: https://github.com/astooke/rlpyt/blob/master/rlpyt/envs/atari/atari_env.py but the corresponding docs page seems to be missing: https://rlpyt.readthedocs.io/en/latest/pages/env.html# Not the biggest deal as it is easy to understand the open source code in this particular case, but it may help for the website to have the complete documentation.
Thanks @astooke, is there an example for a simple gym environment like cart pole or mountain car? I found examples of atari and mujoco, but I think it is better to provide an example with simple environments like classic continuous control in gym
@ekorudiawan Agree, very eager to see that, even if only one example is good.
Hi Alex,
First thank you for adding all the documentation! It's tremendously helpful. I had one question about rlpyt.agents.pg.categorical.CategoricalPgAgent.
In the CategoricalPgAgent documentation, you write "[CategoricalPgAgent] has a different interface to the model (model here outputs discrete probabilities rather than means and log_stds." But in the source code, it looks like the model should output a tuple of (pi, value), where pi is a vector of discrete action probabilities and value is a vector of values associated with each.
Which is correct? Thanks!
Just a quick comment, the Gym and Environments data doesn't seem to be available on the website.
@DanielTakeshi Good catch, thanks! Fixed now (had to mock import atari_py and maybe gym).
@bpiv400 Oops sorry for the confusion, I've just updated the wording in the documentation to clarify that the model should output (pi, value), like in the code.
is there an example for a simple gym environment like cart pole or mountain car?
@ekorudiawan Good idea, no such example exists yet...adding it to the to-do list to make one. Let me know if you beat me to it ;)
Hi,
Thanks for the documentation. After reading through I'm still confused on the dynamic of batch_T / batch_size / replay_ratio. In specific, I'm trying to recreate the training loop here: the idea is there is train_step / batch_size / gradient_step, and the pseudocode would be:
for step in total_num_steps:
env.step(); add tuple to replay buffer
if step % train_step == 0:
for i in range(gradient_step):
batch = buffer.sample() of size batch_size
update model with batch
Is it possible to replicate this behavior or come close to it in rlpyt? For now I am trying to approximate train_step=100, batch_size =128, gradient_step=100 with batch_T=100, batch_size=128, replay_ratio=1.28.
@bycn OK yes let's see... yes if you use batch_T=100
, then the environments will do 100 time-steps in between training, so that's like train_step=100
, if you use batch_B=1
, which is the number of parallel environments (you could use any batch_T * batch_B = 100
). But your replay ratio needs to be 128, because it's data_consumed / data_generated
, which in your case is gradient_step * batch_size / train_step
. Does that make sense?
Hi,
I am experimenting with examples/example_5.py
, trying to reproduce results of the original Mnih et al. Nature paper from 2015.
I have started with the following parameters (not touching the source for now): python example_5.py --cuda_idx=0 --n_parallel=20 --game="space_invaders"
Where can I find the trained model? I see lines like dqn_space_invaders_0 itr #181199 saving snapshot...
in the log, but where are these snapshots being saved to?
@jetsnguns
Look at data/local
and you should see example_5
as a sub-directory. Snapshots are hopefully saved there but only if you have set them to be saved. You may want to add the "snapshot_mode" argument to the logger. Please see my comment here: https://github.com/astooke/rlpyt/issues/66
Just a quick note that the results from the Mnih et al paper were done with a serial sampling code, and examples/exampe_5.py
uses the GPU sampler with more than 1 parallel environments by default. Thus, this will introduce some slightly different behavior. If you are looking to exactly reproduce Mnih et al, I recommend examples/example_1.py
but you'll also need to slightly adjust the exploration schedule, and then to make the images 84x84 instead of 104x80, and many other minor changes. It's a bit cumbersome. That's why papers really need open source code. :)
@DanielTakeshi,
Thanks for a prompt reply!
Yes, I have looked at that folder, but there are only debug.log params.json progress.csv
files present. I've also looked at the comments you mentioned and will proceed to try it out.
Perhaps it will make it less confusing to remove the messages like:
2020-05-16 05:40:54.588538 | dqn_space_invaders_0 itr #181199 saving snapshot...
2020-05-16 05:40:54.588735 | dqn_space_invaders_0 itr #181199 saved
that I see in the log when nothing is actually saved.
Hi,
I'm experimenting with a custom env on rlpyt. I've the intention to use different data for training and testing (env shows novel states on testing/evaluation vs training).
I have been using example_1
as stepping stone, so far so good, but I'm not sure how to achieve this last bit: testing.
After running runner.train()
(as in example_1
, inside the logger), I think I should use runner.evaluate_agent()
inside a loop to evaluate the agent several times (or maybe use eval_max_steps=NumEvals
as SerialSampler argument?).
But I'm a bit (more) lost on how to "send" the test signal to my environment from here, as only the SerialSampler 'knows' where the environtment class is, and I cannot find a way to use it to send some arguments to the env in this phase.
In short (I don't know if i'm being clear) I need to know how:
- to test/eval my agent
- to send a "test" argument to my custom env on testing phase
Thank you!
hi @jetsnguns , thanks for your question, and you're right, the agent parameters are not being saved by default. as @DanielTakeshi ponited out, you can add the snapshot_mode
argument to the logger_context
as:
with logger_context(log_dir, run_ID, name, config, snapshot_mode="last"):
you should see a params.pkl
(different from params.json
) in that same folder.
to see the snapshot modes available in the logger: https://github.com/astooke/rlpyt/blob/668290d1ca94e9d193388a599d4f719bc3a23fba/rlpyt/utils/logging/logger.py#L332
@LecJackS Hi sorry for the unclear title, I meant for this thread to be for question about the documentation...let me clarify that in the title...mind copying your question over to a new issue and I'll reply there?