cortex.cpp epic: Benchmarking existing good models

epic: Benchmarking existing good models

Open hiro-v opened this issue 1 year ago • 0 comments

Problem

As an model user day to day, I find it hard to explain and share to my friends which model is good to use, especially with the help of Nitro

Success Criteria

Public markdown for comparison on Nitro page, can refer to this but can be a lot simpler: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
The performance metrics should be generated using https://github.com/ray-project/llmperf as de-factor tool to measure with below table
The perplexity metrics should be measured as below table with this tool: https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#perplexity-measuring-model-quality

Sub Issues

Additional context

The result should come with OS, CPU architecture, RAM, model name, GPU (Metal/ NVIDIA GPU/ etc)

Nov 22 '23 18:11 hiro-v