Douglas Monsky comments

Results 11 comments of


                                            Douglas Monsky

How to eval output with ideal_answer directly without having to define the completion_fn ?

Hey @liuyaox, I'm not entirely sure if I've grasped your question accurately, but I'll endeavor to provide the best assistance possible. I am assuming this is intended for your personal...

add eval against machiavellianistic attitudes

_****Full disclosure, although I am an active contributor, I am not directly affiliated with OpenAI. My advice is based on my experience and is not authoritative.****_ ------------------------------------ Hello @Huge, In...

Poker Hands Analysis Eval (19.8% accuracy)

whoops not done yet. Edit: I've completed the necessary changes as discussed. Please let me know if you require any further adjustments. I have two other pending pull requests (...

Poker Hands Analysis Eval (19.8% accuracy)

> Thanks for implementing the requested changes. Kindly revert changes in the `evals/cli/oaievalset.py` file. If you want to push any changes other than the `eval` submission, do it in a...

EvalSet for 2D Maze Solving Performance Across Multiple Difficulties

@usama-openai, I have made the requested changes to this pull request as well as to PR #730. If these changes are sufficient, I will proceed with the modifications requested for...

EvalSet for 2D Maze Solving Performance Across Multiple Difficulties

@usama-openai my apologies, I missed that this was modified similar to the other request at an earlier stage.

EvalSet for 2D Maze Solving Performance Across Multiple Difficulties

@usama-openai thank you for the opportunity to work on these improvements. I've incorporated your requests and have seen significant progress, but I'd like to discuss a few points. 1 -...

EvalSet for 2D Maze Solving Performance Across Multiple Difficulties

> Providing the model a chance to produce a chain of thought or to reason is analogous to providing it a reasonable opportunity to solve complex questions, which is necessary...

EvalSet for 2D Maze Solving Performance Across Multiple Difficulties

> The prompt instructions in your dataset seem good enough and don't need any further improvement. But there are some issues with the ideal answers. The ideal answers for `mazes-singlemove-3x3`,...

[Resolves Issue #1228] Improve ModelGraded Evals Formatting for Increased GPT Compliance

Hey @jwang47, I appreciate your feedback and comprehend the importance of maintaining reproducibility in our existing evaluations. However, before you decide on my proposal, I want to ensure that we...