rageval
rageval copied to clipboard
Evaluation tools for Retrieval-augmented Generation (RAG) methods.
adjust the compute function in tests/units
@FBzzh @yuanpcr you can list all potential metrics for the `validate` task in this issue. For more details about the `validate` task, you can refer to issue #13 .
- add jieba tokenizer - metrics: F1、Claim Recall、Rouge-L和BLEU
List all most used datasets in RAG researches, and we will add them to the benchmarks. - [ ] THUDM/webglm-qa from huggingface: https://huggingface.co/datasets/THUDM/webglm-qa - [ ] NaturalQuestions from huggingface: https://huggingface.co/datasets/natural_questions...
Add the [DPR benchmark](https://github.com/facebookresearch/DPR) of ranking, where the model could be implemented with bert-based encoder. The embedding could be [DPR](https://github.com/facebookresearch/DPR) embedding or [BGE embedding](https://huggingface.co/BAAI/bge-large-en).
@QianHaosheng @bugtig6351 @yuanpcr you can list all potential metrics for the `generate` task in this issue. For more details about the `generate` task, you can refer to issue #12 .
In this issue, we discuss the potential metric used to evaluate the quality of an input dataset. The quality of dataset is very important since there are many automatically generated...
@RZFan525 you can list all potential metrics for the rank task in this issue.
@youngbeauty250 you can list all available metrics for rewrite task in this issue.