rageval
rageval copied to clipboard
Evaluation tools for Retrieval-augmented Generation (RAG) methods.
The `rewrite` task is to convert each input question into a list of queries, which are more friendly to search systems.
Enhance the base metric for robust evaluation: 1. add attribute: ```self.task``` to distinguish input format. 2. add function: ```def _validate_data(self, input: Dataset) -> bool``` to check the validity of the...
subtasks: - [x] #24 - [ ] #25 - [ ] #26 - [x] #27 - [ ] #28 - [ ] #29 - [ ] #30
@FBzzh you can list all potential metrics for the `compress` task in this issue.
subtasks in benchmark initialization: - [ ] #14 - [ ] #9 - [ ] #10 - [ ] #11 - [ ] #12 - [ ] #13
Here, the `rank` task includes both the retrieval stage and the re-rank stage in the searching process.
This task is to evaluate the answer groundedness for the generated `answers`.