Jiaxin Shan

Results 742 comments of Jiaxin Shan

@JessyTsu1 1. Laws LLM 是直接基于用户input 搜索出对应条文?还是keywords LLM的output? 所以幻觉减轻的方式是改变为关键字搜索vector DB? 2. Keyword LLM 应该就是个bert model 用作embedding model吧? 3. 这样的话整个request链路 感觉很慢. laws LLM 一次调用,bert这边跟正常embedding 类似,self-suggestion一次调用,chatlaw一次,这样就三次LLM的调用了..

I highly suggest your guys to use kuberay, launch a ray cluster and submit vLLM worker. That's the most easiest way I found and kuberay will reduce your chance coming...

/hold Let's hold this change. The upstream is slightly different from our downstream. We need more testing on this PR.

@pacoxu Sure. We will add some before and after logs and metrics we have in the issue.

We rebased the master and did one more round testing yesterday and the performance meets the expectation, we can unhold this story. This PR address issue https://github.com/kubernetes/kubernetes/issues/112264. /hold cancel

@vinaykul @mrunalp Do you get chance to look at the improvement?

@vinaykul It addresses issues in https://github.com/kubernetes/kubernetes/issues/112264. I didn't create new issues

@SergeyKanzhelev @mrunalp please help add 1.29 milestone label and thank you!

@vinaykul @pacoxu @MaryamTavakkoli this is kind of critical on the performance side, can we include this one in v1.29?

we will address all the comments here and move to later release. @MaryamTavakkoli