FastChat
FastChat copied to clipboard
Why inference so slow with get_model_answer.py?
trafficstars
I tried to infer my data using get_model_answer.py with A100-80g, but each query took over 30 seconds to infererence. However, when I deployed the model with openai-api on the same machine and replaced the get_model_answers function in get_model_answer.py with the api request, the inference time decreased to 6 seconds. I am really puzzled about the difference between get_model_answer.py and openai-api. How could this happen?