Wangyu Han comments

Repositories
Issues
Comments

Results 4 comments of


                                            Wangyu Han

Activator LB doesn't work well

Similar issue : https://github.com/knative/serving/issues/12593

Activator LB doesn't work well

@nader-ziada > if the requests are done quickly enough, the same pod can handle them, then the activator might not need to send the any requests to the last pod...

Q) GPU Serving, Batcher with containerConcurrency setting and about autoscaling

@yuzisun Thank you for answering. How do I set targetConcurrency? Is it the number of simultaneous connections to queue-proxy? It seems to be very difficult to set the value due...

Q) GPU Serving, Batcher with containerConcurrency setting and about autoscaling

Or are there any other metrics that i can use? With Triton, we can use `nv_inference_queue_duration_us` for autoscaling ([refer](https://developer.nvidia.com/blog/deploying-nvidia-triton-at-scale-with-mig-and-kubernetes/)) I think it's very helpful if there is similar metric in...