Edan Toledo

Results 81 comments of Edan Toledo

hmm, i'm not sure i fully understand but this jumps out to me as an autoreset api potential issue. If you turn off navix autoreset and use the jumanji autoreset...

ah i think i get it so this is due to the 'value' being bootstrapped on is actually the value of the first observation of the new episode? if thats...

i actually think i fixed this in a private version of stoix for a paper and then it never got transferred...

so something came up, i actually think the problem will involve more changes than i'd hoped but ill try get back to it later today. If you wanna give it...

just to further put this here so i dont forget. This would an example of constructing the baseline vs boostrap values. baseline values go from 0...k-1 and bootstrap values go...

https://github.com/EdanToledo/Stoix/tree/fix/gae_calc i added the change here but i havent had time to check it thoroughly

Doing a test between the main branch and that branch got these results on 10 seeds per env and ant, halfcheetah, humanoid, hopper. Doing the RLiable eval so this can...

unfortunately brax is quite different to the mujoco envs that everyone used to use. Additionally, even the brax different versions and different physics backends change the results significantly. All this...

I dont think cartpole or breakout have truncation and regarding those two results i have reproduced them in my own capacity

Also see results here: https://arxiv.org/pdf/2411.00666 if you go all the way to the appendix, there are results for each task and these results were generated using stoix's PPO. Although these...