openvino_notebooks icon indicating copy to clipboard operation
openvino_notebooks copied to clipboard

Serving the OpenVINO Model In OpenShift

Open ChamanSahil opened this issue 2 years ago • 4 comments

I was going through the Clip Zero-Shot Image Classification section and I replicated the notebook into my OpenShift Data Science Hub. As per the notebook instructions, I converted the PyTorch model into the OpenVINO IR model.

But then I was unable to find any guide to use the XML and bin files to serve the model via the Model Server of the Data Science Hub. Can anyone please write the steps and guide me on how can I serve the model and get the results back?

I checked various other models also and their bin files are way less than 20MB whereas the Clip Model 16-bit variant is of 360 MB 😯, while the 8-bit variant is of149 MB. What is the reason for the exponential growth of size in comparison to others?

Please help me out and get this thing cleared ASAP

ChamanSahil avatar Nov 09 '23 07:11 ChamanSahil

Do you mean something like this?

  • "https://docs.openvino.ai/2021.4/ovms_extras_openvino-operator-openshift-readme.html"
  • "https://developers.redhat.com/learning/learn:openshift-data-science:get-started-intel-openvino/resource/resources:start-your-jupyter-notebook-server-intel-openvino"
  • "https://developers.redhat.com/learn/openshift-data-science/get-started-intel-openvino"
  • "https://www.intel.com/content/www/us/en/developer/articles/technical/red-hat-openshift-data-science-with-intel-ai-tools.html"

brmarkus avatar Nov 09 '23 09:11 brmarkus

You could write a whole master thesis about the models and their conversion and quantization. Some operations could be optimized, others couldn't when converting or quantizing. But generally seen, the size reduction with conversion between FP32 (floating point 32bit) to FP16 to INT8 to INT4 is almost "linear".

brmarkus avatar Nov 09 '23 09:11 brmarkus

image

image

After deploying the model into Model Server, this is the warning that I am seeing. I guess this is the reason why I am not able to get the inference endpoint working. Is it so? Can anyone help me with this?

Also, can anyone guide me on maintaining a valid data connection for the Model Server of an AWS S3 bucket? Currently I am using the following values:

image

These are my S3 model files 👇: S3 Bucket

ChamanSahil avatar Nov 09 '23 10:11 ChamanSahil

Just saw that @raymondlo84 has posted "OpenVINO and Red Hat OpenShift! This time we showcased Llama2 (INT8 and INT4!!!!!) on GPU+CPU, LCM Stable Diffusion, and OpenVINO notebooks on Red Hat Open Shift :) Thanks everyone making it real." on LinkedIn.

@raymondlo84 maybe you can forward the question?

brmarkus avatar Nov 10 '23 08:11 brmarkus

Closing this as Ria had followed up.

raymondlo84 avatar Aug 27 '24 19:08 raymondlo84