deepeval issues

Results 49 deepeval issues

Sort by recently updated

Metric Request: PII Score

PII or Personal Identifiable Information is a very important factor in asses the Generation quality of LLMs. PII can include the name of the Person, Credit Card Number etc, etc....

Anindyadeep

Make Own Implementation of Dbias and Detoxify

Currently the Dbias and Detoxify packages are incredibly dated, which means for a lot of users this is causing dependency issues when installing. The goal is to move the implementation...

penguine-ip

help wanted

Rich Progress Context overwrites one another when running pytest in parallel

The progress spinner (https://github.com/confident-ai/deepeval/blob/main/deepeval/progress_context.py#L23) overwrites when it is used in parallel. Specifically at `deepeval test run .py -n 3`

penguine-ip

bug

help wanted

Add metric for NER

The library can be enhanced by adding the following metric for NER - Message Understanding Conference (MUC)

RigvedRocks

Integration of LM Evaluation Harness.

Harness being one of the general evaluation frameworks for hundreds of tasks and benchmarks on different types of metrics. - check LM EValuation Harness [here](https://github.com/EleutherAI/lm-evaluation-harness) A general evaluation of LLMs...

Anindyadeep

Faithfulness score and SummaC model parameters are not very much clear.

Right now, it is not clear how SummaC models which is working under faithfulness score. As it does not have clear documentation on what each argument is doing etc. Need...

Anindyadeep

Add JudgeLM as a way to evaluate and compare historical test runs

Currently, metrics are computed based on test cases that run during evaluation. However, there's currently no way to compare historical test runs' performances except for comparing metric scores for each...

penguine-ip

enhancement

help wanted

A metrics class containing different definitions of metrics ordered by some taxonomy.

**Is your feature request related to a problem? Please describe.** So right now, deepeval package has different types of checks. Example: - FactualConsistency check - conceptual similarity - RAG check...

Anindyadeep

feat: add question generator with llamaindex

**Description** This PR introduces the QuestionGenerator class that leverages the llama_index library to automatically generate questions from a given document. This enhancement aims to streamline the question generation process by...

chziakas

deepeval
deepeval copied to clipboard

Metadata

Metric Request: PII Score

Make Own Implementation of Dbias and Detoxify

Rich Progress Context overwrites one another when running pytest in parallel

Add metric for NER

Integration of LM Evaluation Harness.

Faithfulness score and SummaC model parameters are not very much clear.

Add JudgeLM as a way to evaluate and compare historical test runs

A metrics class containing different definitions of metrics ordered by some taxonomy.

feat: add question generator with llamaindex

← Metadata

Owner

Metadata

deepeval deepeval copied to clipboard

Metadata

← Metadata

Owner

Metadata

deepeval
deepeval copied to clipboard