Harry Shin

Results 4 issues of Harry Shin

Since maze2d-umaze-v1 uses a sparse reward type where a distance less than 0.5 yields a reward of 1, I was wondering how the above is possible. The first timeout field...

Hi, I'm trying to generate the ant maze dataset using the generation script but getting a "No module named 'locomotion.ant'" error when loading the policy `load_policy('ant_hierarch_pol.pkl')`. I installed the locomotion...

Hello, I want to reproduce the lm evaluation harness results reported in the blog. Since the prompts need to be formatted with the user, assistant, system, end tokens, the evaluation...

The blog post says the alpha and beta version of StarChat have not been aligned to human preferences with techniques like RLHF, so they can produce problematic outputs (especially when...