Wangyu Han

Results 4 comments of Wangyu Han

Similar issue : https://github.com/knative/serving/issues/12593

@nader-ziada > if the requests are done quickly enough, the same pod can handle them, then the activator might not need to send the any requests to the last pod...

@yuzisun Thank you for answering. How do I set targetConcurrency? Is it the number of simultaneous connections to queue-proxy? It seems to be very difficult to set the value due...

Or are there any other metrics that i can use? With Triton, we can use `nv_inference_queue_duration_us` for autoscaling ([refer](https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/)) I think it's very helpful if there is similar metric in...