cortex.cpp
cortex.cpp copied to clipboard
epic: Benchmarking existing good models
Problem
- As an model user day to day, I find it hard to explain and share to my friends which model is good to use, especially with the help of Nitro
Success Criteria
- Public markdown for comparison on Nitro page, can refer to this but can be a lot simpler: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
- The performance metrics should be generated using https://github.com/ray-project/llmperf as de-factor tool to measure with below table
- The perplexity metrics should be measured as below table with this tool: https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#perplexity-measuring-model-quality
Sub Issues
- To be updated
Additional context
- The result should come with OS, CPU architecture, RAM, model name, GPU (Metal/ NVIDIA GPU/ etc)