andreapairon

Results 6 issues of andreapairon

Hi all, I noticed in the scaling doc page (https://github.com/kserve/modelmesh-serving/blob/main/docs/production-use/scaling.md) that now is possible to set the `ServingRuntime `autoscaling with `HPA`, but using metrics based on cpu utilization. Is it...

question

Hi all, which is the way to correctly use the HPA autoscaling on ServingRuntime? Should I remove the `replicas` property under `spec` ? Should I update all the YAML files...

bug

Would be nice having a new parameter in the `InferenceService` CRD that allows user to specify the model size (the size in bytes), avoiding the `MODEL_MULTIPLIER` factor to estimate the...

enhancement

Is it possible to leverage Triton client (https://github.com/triton-inference-server/client) features like the on-wire compression of request/response on HTTP using the current `/infer` endpoint? (https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#compression) If not, will be implemented on future...

Hello everyone, I get this error while importing the `Worker` class: ``` from conductor.client.worker.worker import Worker ../../.local/lib/python3.8/site-packages/conductor/client/worker/worker.py:14: in from conductor.client.http.api_client import ApiClient ../../.local/lib/python3.8/site-packages/conductor/client/http/api_client.py:15: in import conductor.client.http.models as http_models ../../.local/lib/python3.8/site-packages/conductor/client/http/models/__init__.py:34: in...

Hi everyone, I have Conductor workers running on Kubernetes pods. I would like to handle the graceful shutdown in this way: 1. stop worker polling (I achieved this by implementing...