muzero-general icon indicating copy to clipboard operation
muzero-general copied to clipboard

The model does not converge for breakout

Open yungangwu opened this issue 2 years ago • 13 comments

Search before asking

  • [X] I have searched the MuZero issues and found no similar feature requests.

Description

I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you!

Additional context

No response

yungangwu avatar Oct 20 '22 02:10 yungangwu

Same issue here, but for all envs.

A quinta, 20/10/2022, 03:58, yungangwu @.***> escreveu:

Search before asking

Description

I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you! Additional context

No response

— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYROELDTJJHTULUPDSF3WECYOLANCNFSM6AAAAAARJWGUG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

JohnPPP avatar Oct 20 '22 06:10 JohnPPP

Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings? @JohnPPP

yungangwu avatar Oct 20 '22 06:10 yungangwu

Tried a bunch of hyperparameters on a bunch of games. Just wasted my time. Perhaps others can show me how can this work...

A quinta, 20/10/2022, 07:47, yungangwu @.***> escreveu:

Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings?

— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211#issuecomment-1285026256, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYRLVXSFUSRWTKDIO5VTWEDTGZANCNFSM6AAAAAARJWGUG4 . You are receiving this because you commented.Message ID: @.***>

JohnPPP avatar Oct 20 '22 08:10 JohnPPP

gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP

yungangwu avatar Oct 20 '22 08:10 yungangwu

Yeah, probably is.

A quinta, 20/10/2022, 09:31, yungangwu @.***> escreveu:

gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP https://github.com/JohnPPP

— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211#issuecomment-1285142029, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYRLWXHNWUUNPJQ4MWQLWED7OVANCNFSM6AAAAAARJWGUG4 . You are receiving this because you were mentioned.Message ID: @.***>

JohnPPP avatar Oct 20 '22 11:10 JohnPPP

Did the reward stay zero the entire time, or did it occasionally get some reward? I have it working on cartpole, but not on Atari. That said, it still gets a reward of 2 or 3 occasionally in breakout, indicating that it is behaving randomly.

dillonmsandhu avatar Oct 31 '22 20:10 dillonmsandhu

I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment

zsn2021 avatar Dec 31 '22 15:12 zsn2021

Yes, I have this problem. I also experimented with another code, muzero-pytorch, on gomoku games, but I adjusted for a long time and didn't get the ideal results.

---Original--- From: @.> Date: Sat, Dec 31, 2022 23:25 PM To: @.>; Cc: @.>;"State @.>; Subject: Re: [werner-duvaud/muzero-general] The model does not converge forbreakout (Issue #211)

I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

yungangwu avatar Dec 31 '22 15:12 yungangwu

Is there a possibility that many networks need to be learned, leading to decision failure. If you can, you can add a contact information and we can communicate privately

zsn2021 avatar Dec 31 '22 15:12 zsn2021

Yes, that's why I guess, probably because it has three series networks need to optimize together, so very careful training to converge. As far as contact information, I'm using the wechat app. Do you know this app?

yungangwu avatar Dec 31 '22 15:12 yungangwu

您可以加我的微信联系方式 13162062294

zsn2021 avatar Dec 31 '22 15:12 zsn2021