random-network-distillation-pytorch icon indicating copy to clipboard operation
random-network-distillation-pytorch copied to clipboard

README asset

Open jcwleo opened this issue 6 years ago • 18 comments

image

jcwleo avatar Nov 17 '18 17:11 jcwleo

2018-11-20 11 13 35

jcwleo avatar Nov 20 '18 02:11 jcwleo

image

jcwleo avatar Nov 20 '18 12:11 jcwleo

image

jcwleo avatar Jan 05 '19 02:01 jcwleo

https://github.com/jcwleo/random-network-distillation-pytorch/blob/master/config.conf Is this config the last one to get similar results as on images above?

I see last pull request is about normalization, maybe UseNorm = True improve reward_per_epi or speed of convergence? And what about UseNoisyNet, when it could better to use?

kslazarev avatar Jan 11 '19 00:01 kslazarev

@kslazarev Hi, I used that config. but only NumEnv is 128 and MaxStepPerEpisode is 4500. In paper, author did not announce Advantage Norm and Noisynet. so I disabled that config.

jcwleo avatar Jan 11 '19 01:01 jcwleo

Result by config in master branch.

MontezumaRevengeNoFrameskip-v4 2019-01-11 15 05 01 gmt 03 00

Right now, set NumEnv==128 and MaxStepPerEpisode==4500. I'll attach result when get 1200-2000 updates.

kslazarev avatar Jan 11 '19 12:01 kslazarev

@jcwleo I see the difference in x-axis scale in reward_per_epi and reward_per_rollout plots. On your MontezumaRevengeNoFrameskip-v4 image they are 1.200k and 12.00k (10x scale). But on my temporary progress image they are 200 and 600 (3x scale). Maybe need to change additional option in config? 2019-01-11 21 58 49 gmt 03 00

kslazarev avatar Jan 11 '19 19:01 kslazarev

Or the x-axis scale (global_update and sample_episode) depends on player survival/experience so on later updates x-axis scale will be the same?

kslazarev avatar Jan 11 '19 19:01 kslazarev

@kslazarev per_rollout and per_epi is not same scale. per_rollout means just one global update(enter agent.train_model()). but per_epi means Env’s one episode info that is one of parallel env. If one episode’s total step is 1024 and Num_step(rollout size) is 128, each scale of x-axis is 8 times different.

jcwleo avatar Jan 11 '19 23:01 jcwleo

@jcwleo Yes, correct. I have another small questions about code. How could be appropriate to ask? Every question as new issue, or move forward to ask in this issue?

kslazarev avatar Jan 11 '19 23:01 kslazarev

@kslazarev I want you to create an issue for each question. :)

jcwleo avatar Jan 12 '19 04:01 jcwleo

NumEnv=128 and MaxStepPerEpisode==4500 2019-01-13 7 40 16 gmt 03 00

Looks similar as in README. On NumEnv=128 I've stopped the process because swap is used.

kslazarev avatar Jan 13 '19 13:01 kslazarev

Hello, can you tell me how many Gpus you used and how long it took you to see this effect?

xiaioding avatar Apr 17 '23 08:04 xiaioding

Hello. Not fast. Don't remember exactly, 1 or 2 NV 1080 Ti

kslazarev avatar Apr 17 '23 08:04 kslazarev

@kslazarev Excuse me, I use 1 3090,2 envs, run for more than 2 hours, the reward is still 0, is this normal? I didn't load the pre-training model

xiaioding avatar Apr 17 '23 09:04 xiaioding

It was 3 years ago. Could not help, I don't remember exactly what problem could cause.

kslazarev avatar Apr 17 '23 09:04 kslazarev

@kslazarev Ok, thanks

xiaioding avatar Apr 17 '23 09:04 xiaioding

@kslazarev Thank you for answering for me.

jcwleo avatar Apr 17 '23 15:04 jcwleo