Wangyu Han
Wangyu Han
Similar issue : https://github.com/knative/serving/issues/12593
@nader-ziada > if the requests are done quickly enough, the same pod can handle them, then the activator might not need to send the any requests to the last pod...
@yuzisun Thank you for answering. How do I set targetConcurrency? Is it the number of simultaneous connections to queue-proxy? It seems to be very difficult to set the value due...
Or are there any other metrics that i can use? With Triton, we can use `nv_inference_queue_duration_us` for autoscaling ([refer](https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/)) I think it's very helpful if there is similar metric in...