model_server
                                
                                 model_server copied to clipboard
                                
                                    model_server copied to clipboard
                            
                            
                            
                        OpenAI API completions endpoint - Not working as expected
I have downloaded LLAMA 3.2 1B Model from Hugging face with optimum-cli
optimum-cli export openvino --model meta-llama/Llama-3.2-1B-Instruct llama3.2-1b/1
Below are files downloaded
Note: I manually removed openvino_detokenizer.bin, openvino_detokenizer.xml, openvino_tokenizer.xml, openvino_tokenizer.bin to ensure we have only 1 bin and 1 xml file in the version 1 folder
Run Model Server with below command ensuring window wsl path is given correct. Also parameter for Intel Iris GPU for docker
docker run --rm -it -v %cd%/ovmodels/llama3.2-1b:/models/llama3.2-1b --device=/dev/dxg --volume /usr/lib/wsl:/usr/lib/wsl -p 8000:8000 openvino/model_server:latest-gpu --model_path /models/llama3.2-1b --model_name llama3.2-1b --rest_port 8000
I have run below command which worked perfect curl --request GET http://172.17.0.3:8000/v1/config
Below is output
{ "llama3.2-1b" : { "model_version_status": [ { "version": "1", "state": "AVAILABLE", "status": { "error_code": "OK", "error_message": "OK" } } ] }
But incase of below curl command for OpenAI API Completions did not worked as expected
curl http://172.17.0.3:8000/v3/completions 
-H "Content-Type: application/json" 
-d '{"model": "llama3.2-1b","prompt": "This is a test","stream": false  }'
Giving Error {"error": "Model with requested name is not found"}