Usama comments

Results 82 comments of


                                            Usama

Common sense logical riddles

For this eval, GPT-3.5 doesn't perform well for grading. So, GPT-4 should be used as the grader model.

Common sense logical riddles

You should see GPT-4 API access enabled in your account in the next few days.

Poker Hands Analysis Eval (19.8% accuracy)

> Thank you for your feedback @usama-openai. I have reverted the changes in the evals/cli/oaievalset.py file as requested. I would appreciate if you could review this and let me know...

Poker Hands Analysis Eval (19.8% accuracy)

You should see GPT-4 API access enabled in your account in the next few days.

[Eval] Evaluation of abstract causal reasoning capabilities of language model

You should see GPT-4 API access enabled in your account in the next few days.

Eval: ASL Classifiers

You should see GPT-4 API access enabled in your account in the next few days.

Add 2 backgammon evals

> Those are all unique plays - without any permutations, which will make this a really long "`Includes`" list. You don't need to add all the moves to the list....

Thanks for implementing the requested changes. This PR is almost good. If you have no issues, can you place the generation script in the `evals/registry/data/backgammon/` directory? That'll make it easy...

Eval: add Dutch lexicon - loanwords and rare words

Thanks for implementing the requested changes. I'm getting the following error while evaluating this PR. ``` b'File "/content/evals/evals/eval.py", line 149, in get_samples' b'return get_jsonl(self.samples_jsonl)' b'File "/content/evals/evals/data.py", line 114, in get_jsonl'...

Eval: add Dutch lexicon - loanwords and rare words

I'm getting the following error now while evaluating this PR: ``` b'File "/usr/lib/python3.10/json/__init__.py", line 346, in loads' b'return _default_decoder.decode(s)' b'File "/usr/lib/python3.10/json/decoder.py", line 337, in decode' b'obj, end = self.raw_decode(s, idx=_w(s,...