evals issues

Results 428 evals issues

Sort by recently updated

Use github.com/apssouza22/chatflow as a conversational layer. It would enable actual API requests to be carried out from natural language inputs.

### Describe the feature or improvement you're requesting Adding this conversational UI would enable people to 'talk' directly with the backend and API requests to be carried out more effectively....

GiovanniSmokes

Sample evaluations completing after timeout cause duplicate results

### Describe the bug There is a default 40s timeout for completion functions as per EVALS_THREAD_TIMEOUT. In eval.py, when evaluating a sample times out, it is retried. However it appears...

robatwilliams

bug

Eval: Advanced emotion analysis for complex scenarios based on a Ph.D. dissertation

# Thank you for contributing an eval! ♥️ 🚨 Please make sure your PR follows these guidelines, __failure to follow the guidelines below will result in the PR being closed...

tuanlemau

Evaluate the cost of running tests

### Describe the feature or improvement you're requesting In many production scenarios, it is important to do cost-benefit analysis, and it will be great if `oaieval` command can also return...

onjas-buidl

How to eval output with ideal_answer directly without having to define the completion_fn ?

### Describe the feature or improvement you're requesting I have already had the output (generated from LLM) and ideal_answers in my jsonl file. For a look: ``` {'input': 'what is...

liuyaox

Feature request for evals: Add support for function call.

### Add support for function call. I would like to `eval` based on prompts that utilize `function_call`. From I have seen in the code, it's not possible at the moment....

srenault

Publish latest evals framework to PyPI

### Describe the feature or improvement you're requesting The have been numerous improvements and fixes to the evals framework itself over the past few months, but these haven't been released...

robatwilliams

Find claims from research paper

### Describe the feature or improvement you're requesting It would be helpful for the next iteration of the generative pre trained model to learn how to identify any and all...

ghost

Accuracy Score

### Generic question about the accuracy score and boostrap_std metric When I run an eval, I got the following report. `{'accuracy': 0.6, 'boostrap_std': 0.1423900220076777}` How to decide the accuracy is...

jeyarajcs

Any website where I can share evaluation results?

### Describe the feature or improvement you're requesting Hi. I was wondering if there is any websites where I can share and see others' evaluation results. Should I run every...

pocca2048

evals
evals copied to clipboard

Metadata

Use github.com/apssouza22/chatflow as a conversational layer. It would enable actual API requests to be carried out from natural language inputs.

Sample evaluations completing after timeout cause duplicate results

Eval: Advanced emotion analysis for complex scenarios based on a Ph.D. dissertation

Evaluate the cost of running tests

How to eval output with ideal_answer directly without having to define the completion_fn ?

Feature request for evals: Add support for function call.

Publish latest evals framework to PyPI

Find claims from research paper

Accuracy Score

Any website where I can share evaluation results?

← Metadata

Owner

Metadata

evals evals copied to clipboard

Metadata

← Metadata

Owner

Metadata

evals
evals copied to clipboard