FastChat Why inference so slow with get_model

Why inference so slow with get_model_answer.py?

Open realgump opened this issue 2 years ago • 0 comments

trafficstars

I tried to infer my data using get_model_answer.py with A100-80g, but each query took over 30 seconds to infererence. However, when I deployed the model with openai-api on the same machine and replaced the get_model_answers function in get_model_answer.py with the api request, the inference time decreased to 6 seconds. I am really puzzled about the difference between get_model_answer.py and openai-api. How could this happen?

Jun 15 '23 14:06 realgump

FastChat FastChat copied to clipboard

Why inference so slow with get_model_answer.py?

FastChat
FastChat copied to clipboard