Harry Shin issues

Results 4 issues of


                                            Harry Shin

Maze2d rewards seem odd?

Since maze2d-umaze-v1 uses a sparse reward type where a distance less than 0.5 yields a reward of 1, I was wondering how the above is possible. The first timeout field...

Issues with generating ant maze dataset

Hi, I'm trying to generate the ant maze dataset using the generation script but getting a "No module named 'locomotion.ant'" error when loading the policy `load_policy('ant_hierarch_pol.pkl')`. I installed the locomotion...

Is there a script for evaluating against eleutherAI’s language model evaluation harness?

Hello, I want to reproduce the lm evaluation harness results reported in the blog. Since the prompts need to be formatted with the user, assistant, system, end tokens, the evaluation...

Failure Modes?

The blog post says the alpha and beta version of StarChat have not been aligned to human preferences with techniques like RLHF, so they can produce problematic outputs (especially when...