skb888
skb888
Hi, Does any one can help update or share the latest doc for using multi-node to deploy LLM model? Thanks a lot.
Thanks for sharing. I have tested and faced two issues. 1. Hugging Face client version (latest huggingface_hub) no longer uses use_auth_token. You may need Replace use_auth_token= with token= 2. Also,...
Another new issue I face when I move to next steps. In step3. Create a ServingRuntime I change from pipelineParallelSize: 1 to pipelineParallelSize: 2 in the https://github.com/kserve/kserve/blob/master/config/runtimes/kserve-huggingfaceserver-multinode.yaml. OW, I will...
Thanks for the response. It is happening in the step2 Download the Model to the PVC. I change the memory from "1Gi" to "10Gi" so that I could run python...
Here is the detailed errors I have faced when I run: kubectl apply -f kserve-huggingfaceserver-multinode.yaml Error from server (Forbidden): error when applying patch: {"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"serving.kserve.io/v1alpha1\",\"kind\":\"ClusterServingRuntime\",\"metadata\":{\"annotations\":{},\"name\":\"kserve-huggingfaceserver-multinode\"},\"spec\":{\"annotations\":{\"prometheus.kserve.io/path\":\"/metrics\",\"prometheus.kserve.io/port\":\"8080\"},\"containers\":[{\"args\":[\"--model_name={{.Name}}\"],\"command\":[\"bash\",\"-c\",\"export MODEL_DIR_ARG=\\\"\\\"\\nif [[ ! -z ${MODEL_ID} ]]\\nthen\\n...
After I switch PipelineParallelSize to 2. then I run step4 Deploy the model in KServe V0.15.0: Here is the yaml file I refer to given example: https://kserve.github.io/archive/0.15/modelserving/v1beta1/llm/huggingface/multi-node/#3-create-a-servingruntime I have added...
Thanks for the suggestion. I have tried created model_list.yaml and then run below. It still not works. Meanwhile, I have tried qwen3:4b model and faced the same issues. holmes ask...
Hi, I think deepseek-r1:8b does not support for function calls. I have tested qwen3:4b and llama3.2:3b which support function calls. I have tried both OpenAI-compatible gateway(--model="openai/") and original approach (--model="ollama/")....
Thanks for the quick response. I have double checked the OPENAI_API_BASE has been configured correctly. echo $OPENAI_API_BASE http://127.0.0.1:11434/v1 ollama list NAME ID SIZE MODIFIED llama3.2:3b a80c4f17acd5 2.0 GB 29 hours...
Thanks, it works for me now.