Usama comments

Results 82 comments of


                                            Usama

add draw_svg evaluation

Closing the PR due to inactivity; please reopen if you get a chance to address comments.

Add points-on-line eval

Closing the PR due to inactivity; please reopen if you get a chance to address comments.

[evals] added historical facts examples and list of incorrect ones

Closing the PR due to inactivity; please feel free to reopen if you get a chance to address the comments.

add thesis retrieval eva

Closing the PR due to inactivity; please reopen if you get a chance to address comments.

invert-string eval

Sorry for the confusion. I mean the description in the `.yaml` file because once this PR is merged, the only way to get any information about this eval will be...

invert-string eval

You need to merge the `master` branch into your branch to resolve workflow-related issues. Kindly update your branch with the latest master branch.

Added counting eval

Thanks for opening this PR, Character-level reasoning and operations are a well-known failure mode of the model due to a common underlying issue in LLMs. In its current form, this...

Word Search (0.0075 acc, 0.30 f1)

Thank you for opening this PR. We're not accepting evals that have custom code implementations at this moment (but we are accepting custom model-graded evals). If possible, could you rewrite...

Word Search (0.0075 acc, 0.30 f1)

Closing the PR due to inactivity; please reopen if you get a chance to address comments.

Add Recursive Functions eval

Thanks for opening this PR. To provide output for such a complex piece of code, it is hard for the model to do a zero-shot without a chance to reason...