deepeval
deepeval copied to clipboard
The LLM Evaluation Framework
PII or Personal Identifiable Information is a very important factor in asses the Generation quality of LLMs. PII can include the name of the Person, Credit Card Number etc, etc....
Currently the Dbias and Detoxify packages are incredibly dated, which means for a lot of users this is causing dependency issues when installing. The goal is to move the implementation...
The progress spinner (https://github.com/confident-ai/deepeval/blob/main/deepeval/progress_context.py#L23) overwrites when it is used in parallel. Specifically at `deepeval test run .py -n 3`
The library can be enhanced by adding the following metric for NER - Message Understanding Conference (MUC)
Harness being one of the general evaluation frameworks for hundreds of tasks and benchmarks on different types of metrics. - check LM EValuation Harness [here](https://github.com/EleutherAI/lm-evaluation-harness) A general evaluation of LLMs...
Right now, it is not clear how SummaC models which is working under faithfulness score. As it does not have clear documentation on what each argument is doing etc. Need...
Currently, metrics are computed based on test cases that run during evaluation. However, there's currently no way to compare historical test runs' performances except for comparing metric scores for each...
**Is your feature request related to a problem? Please describe.** So right now, deepeval package has different types of checks. Example: - FactualConsistency check - conceptual similarity - RAG check...
**Description** This PR introduces the QuestionGenerator class that leverages the llama_index library to automatically generate questions from a given document. This enhancement aims to streamline the question generation process by...