muzero-general
muzero-general copied to clipboard
The model does not converge for breakout
Search before asking
- [X] I have searched the MuZero issues and found no similar feature requests.
Description
I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you!
Additional context
No response
Same issue here, but for all envs.
A quinta, 20/10/2022, 03:58, yungangwu @.***> escreveu:
Search before asking
- I have searched the MuZero issues https://github.com/werner-duvaud/muzero-general/issues and found no similar feature requests.
Description
I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you! Additional context
No response
— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYROELDTJJHTULUPDSF3WECYOLANCNFSM6AAAAAARJWGUG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings? @JohnPPP
Tried a bunch of hyperparameters on a bunch of games. Just wasted my time. Perhaps others can show me how can this work...
A quinta, 20/10/2022, 07:47, yungangwu @.***> escreveu:
Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings?
— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211#issuecomment-1285026256, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYRLVXSFUSRWTKDIO5VTWEDTGZANCNFSM6AAAAAARJWGUG4 . You are receiving this because you commented.Message ID: @.***>
gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP
Yeah, probably is.
A quinta, 20/10/2022, 09:31, yungangwu @.***> escreveu:
gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP https://github.com/JohnPPP
— Reply to this email directly, view it on GitHub https://github.com/werner-duvaud/muzero-general/issues/211#issuecomment-1285142029, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPAYRLWXHNWUUNPJQ4MWQLWED7OVANCNFSM6AAAAAARJWGUG4 . You are receiving this because you were mentioned.Message ID: @.***>
Did the reward stay zero the entire time, or did it occasionally get some reward? I have it working on cartpole, but not on Atari. That said, it still gets a reward of 2 or 3 occasionally in breakout, indicating that it is behaving randomly.
I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment
Yes, I have this problem. I also experimented with another code, muzero-pytorch, on gomoku games, but I adjusted for a long time and didn't get the ideal results.
---Original--- From: @.> Date: Sat, Dec 31, 2022 23:25 PM To: @.>; Cc: @.>;"State @.>; Subject: Re: [werner-duvaud/muzero-general] The model does not converge forbreakout (Issue #211)
I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>
Is there a possibility that many networks need to be learned, leading to decision failure. If you can, you can add a contact information and we can communicate privately
Yes, that's why I guess, probably because it has three series networks need to optimize together, so very careful training to converge. As far as contact information, I'm using the wechat app. Do you know this app?
您可以加我的微信联系方式 13162062294