luqman-openai comments

Results 26 comments of


                                            luqman-openai

binary count

Thanks for opening this PR, character level reasoning and counting is a well-known failure mode of the model due to a common underlying issue in LLMs. In its current form,...

Projection Distances (0% Accuracy)

Thanks for opening this PR, arithmetic and other complex calculations are hard for the model to do zero-shot, without a chance to reason through the steps or to use tools...

Thanks for contribution. It seems the implementation of `evals.elsuite.2truths1lie:Truths2Lie1` seems to be missing in the PR. We're not accepting Evals that have custom code implementations at this moment (but we...

eval for idiom usage in a sentence

Closing the PR due to inactivity and incomplete files like `idioms/few_shot.jsonl`, please reopen if you get a chance to complete this PR.

Logical reasoning eval | Accuracy 0%

Closing for now, please feel free to reopen if you get a chance to address the comments.

Temperature Conversion from Celsius to Fahrenheit (Fails on very high numbers)

Thank you for opening this PR, we're not accepting Evals that have custom code implementations at this moment (but we are accepting custom model-graded evals). If possible, could you rewrite...

Mass <-> Weight Conversions With Given Planets

Closing for now, please feel free to reopen if you get a chance to address the comments.

Eval: Added Repeating Consonants Eval

Thanks for opening this PR, Character level reasoning and counting is a well-known failure mode of the model due to a common underlying issue in LLMs. In its current form,...

Fibonacci word selection character count total

Closing for now, please feel free to reopen if you get a chance to address the comments.

[Eval] Wordcount-Multilingual

Closing for now, please feel free to reopen if you get a chance to address the comments.