codingliuyg
codingliuyg
Sorry to reply you so late,I'm working on the code these days,tank you so much for your advice.After following your advice,the code works fine 。 There are another two quick...
hello, for point 1, What i mean is my importance ratio shape is (128,17) when the env is 'Humanoid-v2' and mini batch size is 128. Then should i convert the...
hello novatig, I still have a few questions I'm not clear about。 1、As shown in figure 1,The parameters(beta,c_max,far policy rate,lr of policy net) appear to vary normally.But as shown in...
Thank you very much for your reply! 1、My immediate question is :**In order to ensure a 10% far policy ratio, My beta value is maintained at 0.1 to 0.2**.My understanding...
hello, 1、Can you show me your DDPG-policy structure of your network?Is it the same as the one below?  2、Is my ratio ρ calculation correct?And compare it directly with c_max....
> I think even if the entire seq is cached, it should still be scheduled, and the input at this time only needs to be the last token. @GentleCold You're...