cortex.cpp
cortex.cpp copied to clipboard
epic: cortex.cpp benchmark + Backend Infra
Currently the example server for cortex.llamacpp and cortex.tensorrtllm can get the following resuls: With avg contex length 400:
- cortex.llamacpp: 850 token/s
- cortex.tensorrt-llm: 1450 token/s
We need to benchmark cortex-cpp server and make sure performance of cortex-cpp corresponding to example server