model_server OpenAI API completions endpoint

OpenAI API completions endpoint - Not working as expected

Open anandnandagiri opened this issue 1 year ago • 8 comments

I have downloaded LLAMA 3.2 1B Model from Hugging face with optimum-cli

optimum-cli export openvino --model meta-llama/Llama-3.2-1B-Instruct llama3.2-1b/1

Below are files downloaded

Note: I manually removed openvino_detokenizer.bin, openvino_detokenizer.xml, openvino_tokenizer.xml, openvino_tokenizer.bin to ensure we have only 1 bin and 1 xml file in the version 1 folder

Run Model Server with below command ensuring window wsl path is given correct. Also parameter for Intel Iris GPU for docker

docker run --rm -it -v %cd%/ovmodels/llama3.2-1b:/models/llama3.2-1b --device=/dev/dxg --volume /usr/lib/wsl:/usr/lib/wsl -p 8000:8000 openvino/model_server:latest-gpu --model_path /models/llama3.2-1b --model_name llama3.2-1b --rest_port 8000

I have run below command which worked perfect curl --request GET http://172.17.0.3:8000/v1/config

Below is output

{ "llama3.2-1b" : { "model_version_status": [ { "version": "1", "state": "AVAILABLE", "status": { "error_code": "OK", "error_message": "OK" } } ] }

But incase of below curl command for OpenAI API Completions did not worked as expected

curl http://172.17.0.3:8000/v3/completions
-H "Content-Type: application/json"
-d '{"model": "llama3.2-1b","prompt": "This is a test","stream": false }'

Giving Error {"error": "Model with requested name is not found"}

Oct 02 '24 00:10 anandnandagiri

model_server model_server copied to clipboard

OpenAI API completions endpoint - Not working as expected

model_server
model_server copied to clipboard