Nick Hill comments

Results 373 comments of


                                            Nick Hill

proposal: Move token decoding and stopping evaluation to router

@OlivierDehaene I was going to rebase this but then realized your benchmark stuff talks directly to the internal proto interface and so would need to be adjusted too.

Async and Sync results in different generation

@jshin49 yes `top_k=1` is not equivalent to greedy. Top k will sample from tokens with scores >= the kth highest score. This means that it could be choosing from more...

[FEATURE] Implement Dynamic SplitFuse

Looks like someone has started working on this: https://github.com/vllm-project/vllm/pull/3106

Refresh aspired servables/versions following config update

I have opened this against 1.15 since that's the version we are using, but can rebase on a different branch if needed. Also apologies in advance for the code, I...

Refresh aspired servables/versions following config update

Thanks @christisg, I've pushed a commit to address your logging comment. I will aim to add unit test coverage when I get a chance... it will take me a bit...

Refresh aspired servables/versions following config update

@netfs apologies for letting this lag. I am not sure when I will realistically have a chance to do this since I'm especially busy right now and not very familiar...

Exclude zookeeper dependency from the project

For now we can update zookeeper to avoid the CVE: https://github.com/kserve/modelmesh/pull/128

Local deployment (Docker, not Kubernetes/MiniKube)

Hi @Phelan164, the short answer is in theory yes (Model-mesh actually started out independently of Kubernetes), but it would require a fair amount of custom setup/configuration, especially if you want...

[Core] Add MultiprocessingGPUExecutor

> My suspection is improper clean up. You can try to have one test for mp, and another for ray. Then they will not have interference. @youkaichao I've updated this...

[Core] Add MultiprocessingGPUExecutor

Thanks again @youkaichao @rkooo567 @zhuohan123! And thanks for your patience @vrdn-23!