LRudL comments

Results 4 comments of


                                            LRudL

Eval-running often hangs on last sample

I also have this issue. It is not about rate limits, because it happens despite running datasets that are definitely below the tokens per minute and requests per minute rate...

Using different models in evaluating mode-graded eval and in generating the completion

I recently struggled to get this to work too so I can share what I found. This is currently implemented in the GitHub version of this repo (but not the...

Using different models in evaluating mode-graded eval and in generating the completion

Regarding #1418: A new PR is not necessary for setting the evaluating model (though the feature really should be documented), since the [full relevant lines](https://github.com/openai/evals/blob/7400b0ee3934d64ff6efd9d4ec04be631625c014/evals/elsuite/modelgraded/classify.py#L29C1-L29C1) are: ``` # treat last...

Using different models in evaluating mode-graded eval and in generating the completion

If you want to run the eval with modelA, and run the grading with modelB, then you can pass in the string "modelA,modelB" as the name of the completer.