deepeval issues

Results 49 deepeval issues

Sort by recently updated

No such file or directory: 'temp_test_run_data.json'_

**Describe the bug** It throws error showing _Error loading test run from disk: [Errno 2] No such file or directory: 'temp_test_run_data.json'_ ![image](https://github.com/confident-ai/deepeval/assets/142291246/6b83e4e9-dfae-45e3-b1c7-9d1ccd200bd2) **To Reproduce** ![image](https://github.com/confident-ai/deepeval/assets/142291246/86f6cf14-6810-4e21-9298-e471e0943424) Steps to reproduce the behavior:...

jonathansoong

retrieval_context

Hi, I'm working with the LLMTestCase example: ```python test_case = LLMTestCase( input="What if these shoes don't fit?", # Substitute this with the actual output from your LLM tool actual_output="We offer...

Jiajing-Chen

Add browser opening flag to config

**❗BEFORE YOU BEGIN❗** Are you on discord? 🤗 We'd love to have you asking questions on discord instead: https://discord.com/invite/a3K9c8GRGt **Is your feature request related to a problem? Please describe.** I...

agamm

Consideration behind a "stateful" metric UX

**Is your feature request related to a problem? Please describe.** Hi, I was wondering what's the considerations behind that we are choosing a "stateful" metrics UX. Using the hallucination metric...

Peilun-Li

Make evaluate() run test cases concurrently

Currently, metrics for each test case is ran concurrently, but not test cases in a test run.

penguine-ip

temp_test_run.json contains partial entries when running in bulk

When running a bulk dataset from commandline: `deepeval test run src.py -n 3` the json file that gets generated contains partial entries. and some test cases are lost even when...

piseabhijeet

Make multithreading optional

**Is your feature request related to a problem? Please describe.** There's some multithreading logic inside some of deepeval's metrics. Some accept the multithreading flag to make that optional. **Describe the...

AndresPrez

Store computed metrics in a dataframe

Objective: I have many test cases - query, response, context trios in a pandas dataframe. I create an llm test case per trio.. however the outcome of bulk testing locally...

piseabhijeet

Integration of Harness.

## Integration of LM-Eval Harness Solves issue: #332 This PR integrates all the features of [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) that helps to conduct evaluations on 200 + tasks (example: MMLU, BigBench, Babi, Winogrande,...

Anindyadeep

Custom dataset evaluation giving ValueError: Input, actual output, and retrieval context cannot be None

When I create an evaluation dataset using EvaluationDataset and I use it with LLMTestCase, I get this: ``` File "/Users/abhijeet.pise/miniconda3/envs/evaluation/lib/python3.12/site-packages/deepeval/evaluate.py", line 174, in evaluate test_results = execute_test(test_cases, metrics, True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...

piseabhijeet

deepeval
deepeval copied to clipboard

Metadata

No such file or directory: 'temp_test_run_data.json'_

retrieval_context

Add browser opening flag to config

Consideration behind a "stateful" metric UX

Make evaluate() run test cases concurrently

temp_test_run.json contains partial entries when running in bulk

Make multithreading optional

Store computed metrics in a dataframe

Integration of Harness.

Custom dataset evaluation giving ValueError: Input, actual output, and retrieval context cannot be None

← Metadata

Owner

Metadata

deepeval deepeval copied to clipboard

Metadata

← Metadata

Owner

Metadata

deepeval
deepeval copied to clipboard