deepeval icon indicating copy to clipboard operation
deepeval copied to clipboard

The LLM Evaluation Framework

Results 49 deepeval issues
Sort by recently updated
recently updated
newest added

PII or Personal Identifiable Information is a very important factor in asses the Generation quality of LLMs. PII can include the name of the Person, Credit Card Number etc, etc....

Currently the Dbias and Detoxify packages are incredibly dated, which means for a lot of users this is causing dependency issues when installing. The goal is to move the implementation...

help wanted

The progress spinner (https://github.com/confident-ai/deepeval/blob/main/deepeval/progress_context.py#L23) overwrites when it is used in parallel. Specifically at `deepeval test run .py -n 3`

bug
help wanted

The library can be enhanced by adding the following metric for NER - Message Understanding Conference (MUC)

Harness being one of the general evaluation frameworks for hundreds of tasks and benchmarks on different types of metrics. - check LM EValuation Harness [here](https://github.com/EleutherAI/lm-evaluation-harness) A general evaluation of LLMs...

Right now, it is not clear how SummaC models which is working under faithfulness score. As it does not have clear documentation on what each argument is doing etc. Need...

Currently, metrics are computed based on test cases that run during evaluation. However, there's currently no way to compare historical test runs' performances except for comparing metric scores for each...

enhancement
help wanted

**Is your feature request related to a problem? Please describe.** So right now, deepeval package has different types of checks. Example: - FactualConsistency check - conceptual similarity - RAG check...

**Description** This PR introduces the QuestionGenerator class that leverages the llama_index library to automatically generate questions from a given document. This enhancement aims to streamline the question generation process by...