deepeval icon indicating copy to clipboard operation
deepeval copied to clipboard

The LLM Evaluation Framework

Results 49 deepeval issues
Sort by recently updated
recently updated
newest added

**Describe the bug** It throws error showing _Error loading test run from disk: [Errno 2] No such file or directory: 'temp_test_run_data.json'_ ![image](https://github.com/confident-ai/deepeval/assets/142291246/6b83e4e9-dfae-45e3-b1c7-9d1ccd200bd2) **To Reproduce** ![image](https://github.com/confident-ai/deepeval/assets/142291246/86f6cf14-6810-4e21-9298-e471e0943424) Steps to reproduce the behavior:...

Hi, I'm working with the LLMTestCase example: ```python test_case = LLMTestCase( input="What if these shoes don't fit?", # Substitute this with the actual output from your LLM tool actual_output="We offer...

**❗BEFORE YOU BEGIN❗** Are you on discord? 🤗 We'd love to have you asking questions on discord instead: https://discord.com/invite/a3K9c8GRGt **Is your feature request related to a problem? Please describe.** I...

**Is your feature request related to a problem? Please describe.** Hi, I was wondering what's the considerations behind that we are choosing a "stateful" metrics UX. Using the hallucination metric...

Currently, metrics for each test case is ran concurrently, but not test cases in a test run.

When running a bulk dataset from commandline: `deepeval test run src.py -n 3` the json file that gets generated contains partial entries. and some test cases are lost even when...

**Is your feature request related to a problem? Please describe.** There's some multithreading logic inside some of deepeval's metrics. Some accept the multithreading flag to make that optional. **Describe the...

Objective: I have many test cases - query, response, context trios in a pandas dataframe. I create an llm test case per trio.. however the outcome of bulk testing locally...

## Integration of LM-Eval Harness Solves issue: #332 This PR integrates all the features of [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) that helps to conduct evaluations on 200 + tasks (example: MMLU, BigBench, Babi, Winogrande,...

When I create an evaluation dataset using EvaluationDataset and I use it with LLMTestCase, I get this: ``` File "/Users/abhijeet.pise/miniconda3/envs/evaluation/lib/python3.12/site-packages/deepeval/evaluate.py", line 174, in evaluate test_results = execute_test(test_cases, metrics, True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...