llama.cpp
llama.cpp copied to clipboard
Study how LM Evaluation Harness works and try to implement it
It would be great to start doing this kind of quantitative analysis of ggml
-based inference:
https://bellard.org/ts_server/
It looks like Fabrice evaluates the models using something called LM Evaluation Harness:
https://github.com/EleutherAI/lm-evaluation-harness
I have no idea what this is yet, but would be nice to study it and try to integrate it here and in other ggml
-based projects.
This will be very important step needed to estimate the quality of the generated output and see if we are on the right track.