Jiaxin Shan
Jiaxin Shan
@zhangjyr @nwangfw please help take a look
@gegenhua913 thanks for your interest. Actually `GPU hardware failure detection` right now is a separate project, we have not fully integrate with upper layer applications. Technically, GPU failure should fail...
@Colstuwjx thanks for the recommendation. We missed this issue. Do you want to cut a PR? If you do not have bandwidth, we can help make it
@libin817927 We have not enabled such case yet. could you give more details on how it deployed at this moment? probably naive approach.
@googs1025 I will address the comments tomorrow. it's a little bit busy last week to work on this issue.
/gemini review
@varungup90 @googs1025 what's the status of this PR? ready to go?
@ying2025 the orchestration part is for launching instances in scale easily and it won't offer inference performance gains. Performance gains mainly comes from other features like routing policies. I am...
@ying2025 Got you. totally make sense. We can come up a full example using some feature combination and show the performance gains. One thing I want to mention is some...
@ying2025 Yes, you can change the service type to load balancer, in that case, it won't bring any benefits. We are fixing the prefix cache & load aware routing strategies...