Jiaxin Shan comments

Results 742 comments of


                                            Jiaxin Shan

“we fine-tuned an LLM to extract the keywords from user queries.”

@JessyTsu1 1. Laws LLM 是直接基于用户input 搜索出对应条文？还是keywords LLM的output? 所以幻觉减轻的方式是改变为关键字搜索vector DB? 2. Keyword LLM 应该就是个bert model 用作embedding model吧？ 3. 这样的话整个request链路感觉很慢. laws LLM 一次调用，bert这边跟正常embedding 类似，self-suggestion一次调用，chatlaw一次，这样就三次LLM的调用了..

Tensor parallelism on ray cluster

I highly suggest your guys to use kuberay, launch a ray cluster and submit vLLM worker. That's the most easiest way I found and kuberay will reduce your chance coming...

Enhance InPlacePodVerticalScaling performance

/hold Let's hold this change. The upstream is slightly different from our downstream. We need more testing on this PR.

Enhance InPlacePodVerticalScaling performance

@pacoxu Sure. We will add some before and after logs and metrics we have in the issue.

Enhance InPlacePodVerticalScaling performance

We rebased the master and did one more round testing yesterday and the performance meets the expectation, we can unhold this story. This PR address issue https://github.com/kubernetes/kubernetes/issues/112264. /hold cancel

Enhance InPlacePodVerticalScaling performance

@vinaykul @mrunalp Do you get chance to look at the improvement?

Enhance InPlacePodVerticalScaling performance

@vinaykul It addresses issues in https://github.com/kubernetes/kubernetes/issues/112264. I didn't create new issues

Enhance InPlacePodVerticalScaling performance

@SergeyKanzhelev @mrunalp please help add 1.29 milestone label and thank you!

Enhance InPlacePodVerticalScaling performance

@vinaykul @pacoxu @MaryamTavakkoli this is kind of critical on the performance side, can we include this one in v1.29?

Enhance InPlacePodVerticalScaling performance

we will address all the comments here and move to later release. @MaryamTavakkoli