rlpyt icon indicating copy to clipboard operation
rlpyt copied to clipboard

Documentation now available (questions about *documentation* here)

Open astooke opened this issue 4 years ago • 16 comments

Hi!

Documentation is now available!

https://rlpyt.readthedocs.io

Feel free to post in this issue for minor clarifications / comments, or start a new issue if it's something bigger.

Hope this helps!!

astooke avatar Jan 27 '20 21:01 astooke

Thanks for the heavy work!

codelast avatar Jan 30 '20 16:01 codelast

Thanks @astooke Just a quick comment, the Gym and Environments data doesn't seem to be available on the website. For example I see documentation here: https://github.com/astooke/rlpyt/blob/master/rlpyt/envs/atari/atari_env.py but the corresponding docs page seems to be missing: https://rlpyt.readthedocs.io/en/latest/pages/env.html# Not the biggest deal as it is easy to understand the open source code in this particular case, but it may help for the website to have the complete documentation.

DanielTakeshi avatar Feb 13 '20 18:02 DanielTakeshi

Thanks @astooke, is there an example for a simple gym environment like cart pole or mountain car? I found examples of atari and mujoco, but I think it is better to provide an example with simple environments like classic continuous control in gym

ekorudiawan avatar Feb 18 '20 04:02 ekorudiawan

@ekorudiawan Agree, very eager to see that, even if only one example is good.

codelast avatar Feb 19 '20 13:02 codelast

Hi Alex,

First thank you for adding all the documentation! It's tremendously helpful. I had one question about rlpyt.agents.pg.categorical.CategoricalPgAgent.

In the CategoricalPgAgent documentation, you write "[CategoricalPgAgent] has a different interface to the model (model here outputs discrete probabilities rather than means and log_stds." But in the source code, it looks like the model should output a tuple of (pi, value), where pi is a vector of discrete action probabilities and value is a vector of values associated with each.

Which is correct? Thanks!

bpiv400 avatar Feb 19 '20 14:02 bpiv400

Just a quick comment, the Gym and Environments data doesn't seem to be available on the website.

@DanielTakeshi Good catch, thanks! Fixed now (had to mock import atari_py and maybe gym).

astooke avatar Feb 20 '20 01:02 astooke

@bpiv400 Oops sorry for the confusion, I've just updated the wording in the documentation to clarify that the model should output (pi, value), like in the code.

astooke avatar Feb 20 '20 01:02 astooke

is there an example for a simple gym environment like cart pole or mountain car?

@ekorudiawan Good idea, no such example exists yet...adding it to the to-do list to make one. Let me know if you beat me to it ;)

astooke avatar Feb 20 '20 01:02 astooke

Hi,

Thanks for the documentation. After reading through I'm still confused on the dynamic of batch_T / batch_size / replay_ratio. In specific, I'm trying to recreate the training loop here: the idea is there is train_step / batch_size / gradient_step, and the pseudocode would be:

for step in total_num_steps:
    env.step(); add tuple to replay buffer
    if step % train_step == 0:
           for i in range(gradient_step):
                    batch = buffer.sample() of size batch_size
                    update model with batch

Is it possible to replicate this behavior or come close to it in rlpyt? For now I am trying to approximate train_step=100, batch_size =128, gradient_step=100 with batch_T=100, batch_size=128, replay_ratio=1.28.

bycn avatar Mar 16 '20 00:03 bycn

@bycn OK yes let's see... yes if you use batch_T=100, then the environments will do 100 time-steps in between training, so that's like train_step=100, if you use batch_B=1, which is the number of parallel environments (you could use any batch_T * batch_B = 100). But your replay ratio needs to be 128, because it's data_consumed / data_generated, which in your case is gradient_step * batch_size / train_step. Does that make sense?

astooke avatar Mar 21 '20 01:03 astooke

Hi, I am experimenting with examples/example_5.py, trying to reproduce results of the original Mnih et al. Nature paper from 2015. I have started with the following parameters (not touching the source for now): python example_5.py --cuda_idx=0 --n_parallel=20 --game="space_invaders" Where can I find the trained model? I see lines like dqn_space_invaders_0 itr #181199 saving snapshot... in the log, but where are these snapshots being saved to?

jetsnguns avatar May 16 '20 16:05 jetsnguns

@jetsnguns

Look at data/local and you should see example_5 as a sub-directory. Snapshots are hopefully saved there but only if you have set them to be saved. You may want to add the "snapshot_mode" argument to the logger. Please see my comment here: https://github.com/astooke/rlpyt/issues/66

Just a quick note that the results from the Mnih et al paper were done with a serial sampling code, and examples/exampe_5.py uses the GPU sampler with more than 1 parallel environments by default. Thus, this will introduce some slightly different behavior. If you are looking to exactly reproduce Mnih et al, I recommend examples/example_1.py but you'll also need to slightly adjust the exploration schedule, and then to make the images 84x84 instead of 104x80, and many other minor changes. It's a bit cumbersome. That's why papers really need open source code. :)

DanielTakeshi avatar May 16 '20 17:05 DanielTakeshi

@DanielTakeshi, Thanks for a prompt reply! Yes, I have looked at that folder, but there are only debug.log params.json progress.csv files present. I've also looked at the comments you mentioned and will proceed to try it out. Perhaps it will make it less confusing to remove the messages like:

2020-05-16 05:40:54.588538  | dqn_space_invaders_0 itr #181199 saving snapshot...
2020-05-16 05:40:54.588735  | dqn_space_invaders_0 itr #181199 saved

that I see in the log when nothing is actually saved.

jetsnguns avatar May 16 '20 18:05 jetsnguns

Hi,

I'm experimenting with a custom env on rlpyt. I've the intention to use different data for training and testing (env shows novel states on testing/evaluation vs training).

I have been using example_1 as stepping stone, so far so good, but I'm not sure how to achieve this last bit: testing.

After running runner.train() (as in example_1, inside the logger), I think I should use runner.evaluate_agent() inside a loop to evaluate the agent several times (or maybe use eval_max_steps=NumEvals as SerialSampler argument?).

But I'm a bit (more) lost on how to "send" the test signal to my environment from here, as only the SerialSampler 'knows' where the environtment class is, and I cannot find a way to use it to send some arguments to the env in this phase.

In short (I don't know if i'm being clear) I need to know how:

  1. to test/eval my agent
  2. to send a "test" argument to my custom env on testing phase

Thank you!

LecJackS avatar May 20 '20 22:05 LecJackS

hi @jetsnguns , thanks for your question, and you're right, the agent parameters are not being saved by default. as @DanielTakeshi ponited out, you can add the snapshot_mode argument to the logger_context as:

with logger_context(log_dir, run_ID, name, config, snapshot_mode="last"):

you should see a params.pkl (different from params.json) in that same folder.

to see the snapshot modes available in the logger: https://github.com/astooke/rlpyt/blob/668290d1ca94e9d193388a599d4f719bc3a23fba/rlpyt/utils/logging/logger.py#L332

astooke avatar May 21 '20 01:05 astooke

@LecJackS Hi sorry for the unclear title, I meant for this thread to be for question about the documentation...let me clarify that in the title...mind copying your question over to a new issue and I'll reply there?

astooke avatar May 21 '20 01:05 astooke