rageval icon indicating copy to clipboard operation
rageval copied to clipboard

Evaluation tools for Retrieval-augmented Generation (RAG) methods.

Results 27 rageval issues
Sort by recently updated
recently updated
newest added

adjust the compute function in tests/units

@FBzzh @yuanpcr you can list all potential metrics for the `validate` task in this issue. For more details about the `validate` task, you can refer to issue #13 .

enhancement

- add jieba tokenizer - metrics: F1、Claim Recall、Rouge-L和BLEU

List all most used datasets in RAG researches, and we will add them to the benchmarks. - [ ] THUDM/webglm-qa from huggingface: https://huggingface.co/datasets/THUDM/webglm-qa - [ ] NaturalQuestions from huggingface: https://huggingface.co/datasets/natural_questions...

Add the [DPR benchmark](https://github.com/facebookresearch/DPR) of ranking, where the model could be implemented with bert-based encoder. The embedding could be [DPR](https://github.com/facebookresearch/DPR) embedding or [BGE embedding](https://huggingface.co/BAAI/bge-large-en).

enhancement

@QianHaosheng @bugtig6351 @yuanpcr you can list all potential metrics for the `generate` task in this issue. For more details about the `generate` task, you can refer to issue #12 .

enhancement
good first issue

In this issue, we discuss the potential metric used to evaluate the quality of an input dataset. The quality of dataset is very important since there are many automatically generated...

@RZFan525 you can list all potential metrics for the rank task in this issue.

@youngbeauty250 you can list all available metrics for rewrite task in this issue.