random-network-distillation-pytorch README asset

Nov 17 '18 17:11 jcwleo

2018-11-20 11 13 35

Nov 20 '18 02:11 jcwleo

Nov 20 '18 12:11 jcwleo

Jan 05 '19 02:01 jcwleo

https://github.com/jcwleo/random-network-distillation-pytorch/blob/master/config.conf Is this config the last one to get similar results as on images above?

I see last pull request is about normalization, maybe UseNorm = True improve reward_per_epi or speed of convergence? And what about UseNoisyNet, when it could better to use?

Jan 11 '19 00:01 kslazarev

@kslazarev Hi, I used that config. but only NumEnv is 128 and MaxStepPerEpisode is 4500. In paper, author did not announce Advantage Norm and Noisynet. so I disabled that config.

Jan 11 '19 01:01 jcwleo

Result by config in master branch.

MontezumaRevengeNoFrameskip-v4 2019-01-11 15 05 01 gmt 03 00

Right now, set NumEnv==128 and MaxStepPerEpisode==4500. I'll attach result when get 1200-2000 updates.

Jan 11 '19 12:01 kslazarev

@jcwleo I see the difference in x-axis scale in reward_per_epi and reward_per_rollout plots. On your MontezumaRevengeNoFrameskip-v4 image they are 1.200k and 12.00k (10x scale). But on my temporary progress image they are 200 and 600 (3x scale). Maybe need to change additional option in config? 2019-01-11 21 58 49 gmt 03 00

Jan 11 '19 19:01 kslazarev

Or the x-axis scale (global_update and sample_episode) depends on player survival/experience so on later updates x-axis scale will be the same?

Jan 11 '19 19:01 kslazarev

@kslazarev per_rollout and per_epi is not same scale. per_rollout means just one global update(enter agent.train_model()). but per_epi means Env’s one episode info that is one of parallel env. If one episode’s total step is 1024 and Num_step(rollout size) is 128, each scale of x-axis is 8 times different.

Jan 11 '19 23:01 jcwleo

@jcwleo Yes, correct. I have another small questions about code. How could be appropriate to ask? Every question as new issue, or move forward to ask in this issue?

Jan 11 '19 23:01 kslazarev

@kslazarev I want you to create an issue for each question. :)

Jan 12 '19 04:01 jcwleo

NumEnv=128 and MaxStepPerEpisode==4500 2019-01-13 7 40 16 gmt 03 00

Looks similar as in README. On NumEnv=128 I've stopped the process because swap is used.

Jan 13 '19 13:01 kslazarev

Hello, can you tell me how many Gpus you used and how long it took you to see this effect?

Apr 17 '23 08:04 xiaioding

Hello. Not fast. Don't remember exactly, 1 or 2 NV 1080 Ti

Apr 17 '23 08:04 kslazarev

@kslazarev Excuse me, I use 1 3090,2 envs, run for more than 2 hours, the reward is still 0, is this normal? I didn't load the pre-training model

Apr 17 '23 09:04 xiaioding

It was 3 years ago. Could not help, I don't remember exactly what problem could cause.

Apr 17 '23 09:04 kslazarev

@kslazarev Ok, thanks

Apr 17 '23 09:04 xiaioding

@kslazarev Thank you for answering for me.

Apr 17 '23 15:04 jcwleo

random-network-distillation-pytorch random-network-distillation-pytorch copied to clipboard

README asset

random-network-distillation-pytorch
random-network-distillation-pytorch copied to clipboard