Kirill Statsenko
Kirill Statsenko
Hi! @adinhodovic I am not sure should I create a separate issue, but I would like to discuss your reply As you mentioned there is a workaround to reduce cardinality...
I also tried to specify `--backend` arg: ```yaml apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: finalizers: - inferenceservice.finalizers name: gpt-oss-20b namespace: gpt-oss spec: predictor: annotations: serving.knative.dev/progress-deadline: 1740s model: args: - --backend=huggingface modelFormat:...
To resolve the issues, I tried to upgrade `transformers` and `vllm` python packages inside HuggingFace runtime image Here is my Docker image 😄 ```Dockerfile FROM kserve/huggingfaceserver:latest-gpu RUN pip install --upgrade...
So I decided to upload this image and use it inside k8s cluster: ClusterServingRuntime: ```yaml apiVersion: serving.kserve.io/v1alpha1 kind: ClusterServingRuntime metadata: name: invu-huggingfaceserver spec: annotations: prometheus.kserve.io/path: /metrics prometheus.kserve.io/port: "8080" containers: -...
So I went deeper and tried to change runtime source code to respect new `vllm` package requirements, as `--enable-reasoning` became deprecated and it no longer may be parsed from args...
@marcelovilla it seems that your pod started on node with no drivers available ```txt kserve-container INFO 09-02 06:44:03 [__init__.py:245] No platform detected, vLLM is running on UnspecifiedPlatform kserve-container WARNING 09-02...
@WinsonSou the main idea here is that the `huggingface` runtime implementation is fragile in the case of a vLLM upgrade because of tight coupling. You can see it here, for...
@spolti I will try, but I need to know what the desired approach is, do we really need a separate runtime for CatBoost? Or it will be enough to just...
@chethanuk thank you for the detailed explanation! In that case I will close https://github.com/kserve/kserve/pull/4603 If it's required, I could help you with the documentation. I have already created https://github.com/kserve/website/pull/526 for...