GenAIExamples
GenAIExamples copied to clipboard
Optimize the LLM backend service during download LLM
When I test LLM backend service:
curl http://${host_ip}:9009/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
In the first startup, the service is downloading the LLM. It takes more time. Before it's finished, the service is not ready and return error:
curl: (7) Failed to connect to 10.239.182.158 port 9009 after 0 ms: Connection refused
I hope it return a friendly error to tell user wait for it.
Other services which depend on them need to be updated too.
When above service is not ready, other service like:
curl http://${host_ip}:9000/v1/chat/completions\
-X POST \
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
-H 'Content-Type: application/json'
return error. It should return friendly info when depended services are not ready.
@NeoZhangJianyu,
We have applied the dependency in the docker compose file.
I think it's not possible to implement for your first requirement. The model is loaded by TGI. We can't control the process.
Is it possible to clone the TGI docker and add enhancement to return error code for LLM downloading status?
@NeoZhangJianyu, You can raise this requirement to TGI team.
@NeoZhangJianyu Is it ok to close this issue?
@NeoZhangJianyu,
Actually TGI provided /health and /info endpoint to check the service status. I think it's enough for users.