Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

Can SLO and QoS feature be implemented based on PD separation?

@zhangjyr @nwangfw please help take a look

One question about "GPU Hardware Failure Detection"

@gegenhua913 thanks for your interest. Actually `GPU hardware failure detection` right now is a separate project, we have not fully integrate with upper layer applications. Technically, GPU failure should fail...

Document for mocked cpu app within quickstart page

@Colstuwjx thanks for the recommendation. We missed this issue. Do you want to cut a PR? If you do not have bandwidth, we can help make it

Does it support speculative decoding with a draft model that is not an ngram?

@libin817927 We have not enabled such case yet. could you give more details on how it deployed at this moment? probably naive approach.

[Feat] Support StormService pause rollout in upgrade

@googs1025 I will address the comments tomorrow. it's a little bit busy last week to work on this issue.

[Feat] Support StormService pause rollout in upgrade

/gemini review

feature: add simple session affinity plugins in gateway plugin

@varungup90 @googs1025 what's the status of this PR? ready to go?

Ask for testing suggestions

@ying2025 the orchestration part is for launching instances in scale easily and it won't offer inference performance gains. Performance gains mainly comes from other features like routing policies. I am...

Ask for testing suggestions

@ying2025 Got you. totally make sense. We can come up a full example using some feature combination and show the performance gains. One thing I want to mention is some...

Ask for testing suggestions

@ying2025 Yes, you can change the service type to load balancer, in that case, it won't bring any benefits. We are fixing the prefix cache & load aware routing strategies...