FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Why inference so slow with get_model_answer.py?

Open realgump opened this issue 2 years ago • 0 comments
trafficstars

I tried to infer my data using get_model_answer.py with A100-80g, but each query took over 30 seconds to infererence. However, when I deployed the model with openai-api on the same machine and replaced the get_model_answers function in get_model_answer.py with the api request, the inference time decreased to 6 seconds. I am really puzzled about the difference between get_model_answer.py and openai-api. How could this happen?

realgump avatar Jun 15 '23 14:06 realgump