Xiaodong Ji
Xiaodong Ji
Sorry for the late response. The latency evaluation script is outdated. The final latency results presented in paper is based on Mistral-7B-Instruct-v0.2. We have updated the latency evaluation script and...
Hello, have you been able to solve this issue? These warnings are not expected by us. Could you please share your script and environment information?
Inactive issue, close.
By default our script use 2 GPU to do evaluations, but you could actually modify this line to use single GPU. https://github.com/HugoZHL/PQCache/blob/778c904e16eb577fb37b94b5a714b7f39f7db91d/run_llama.sh#L8 The memory usage issue can be attributed to...
1. We have updated and tested the environment setup guide — you can try out the latest commit. 2. Considering that Llama 2 is outdated and does not support long...