haystack
haystack copied to clipboard
Benchmark existing techniques using evaluation harness
Context on benchmark work
-
goal number 1 is to give user practical guidance on what techniques to try out on their dataset/use case
-
goal number 2 is to show that there is not a “silver bullet” type of solution, that it depends on the dataset and use case, but that Haystack can support them all
-
goal number 3 is to showcase advanced evaluation/experimentation API (most advanced compared to competitors)
-
it’s not a research paper, so should not be too “academic” (i.e. not too restricted in terms of metrics or datasets to use, not meant to be peer-reviewed or submitted to an academic conference)
### Tasks
- [x] Create a new repository that will have all the code for the benchmark study
- [ ] https://github.com/deepset-ai/haystack/issues/7628
- [ ] https://github.com/deepset-ai/haystack/issues/7629