GenAIExamples icon indicating copy to clipboard operation
GenAIExamples copied to clipboard

Optimize the LLM backend service during download LLM

Open NeoZhangJianyu opened this issue 1 year ago • 2 comments
trafficstars

When I test LLM backend service:

curl http://${host_ip}:9009/generate \
  -X POST \
  -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
  -H 'Content-Type: application/json'

In the first startup, the service is downloading the LLM. It takes more time. Before it's finished, the service is not ready and return error:

curl: (7) Failed to connect to 10.239.182.158 port 9009 after 0 ms: Connection refused

I hope it return a friendly error to tell user wait for it.

Other services which depend on them need to be updated too.

When above service is not ready, other service like:

curl http://${host_ip}:9000/v1/chat/completions\
  -X POST \
  -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
  -H 'Content-Type: application/json'

return error. It should return friendly info when depended services are not ready.

NeoZhangJianyu avatar Aug 07 '24 06:08 NeoZhangJianyu

@NeoZhangJianyu,

We have applied the dependency in the docker compose file.

image

I think it's not possible to implement for your first requirement. The model is loaded by TGI. We can't control the process.

lvliang-intel avatar Aug 29 '24 09:08 lvliang-intel

Is it possible to clone the TGI docker and add enhancement to return error code for LLM downloading status?

NeoZhangJianyu avatar Sep 11 '24 07:09 NeoZhangJianyu

@NeoZhangJianyu, You can raise this requirement to TGI team.

lvliang-intel avatar Nov 03 '24 10:11 lvliang-intel

@NeoZhangJianyu Is it ok to close this issue?

joshuayao avatar Nov 08 '24 03:11 joshuayao

@NeoZhangJianyu,

Actually TGI provided /health and /info endpoint to check the service status. I think it's enough for users.

image

lvliang-intel avatar Nov 20 '24 06:11 lvliang-intel