Li Ming
Li Ming
hi, I met the same problem as yours, have you solved this problem?
> > hi, I met the same problem as yours, have you solved this problem? > > try to use llama-cpp-python replace llama.cpp start server Thanks for your reply, I'...
> > What model here is being evaluated? I will try to look into this soon. > > Qwen1.8B I discovered that the original script gguf.py, when matched with llama-cpp-python...
I want to evaluate the inference latency, throughput, parameter numbers of a custom llm