nano-vllm
nano-vllm copied to clipboard
nano vLLM v1 engine
Hi, nano vLLM with engine v1 is here!
Since v0.6.0 release of vLLM, there is a brand new engine backend with multiprocessing, as described in the official blog. I have implemented this in the nano-vLLM. You can check that from my repo. For now, it is not supported to switch 2 different engine with ENV easily.
- v1 move the controller thread to cpu, and make metadata update only when something changes instead of passing it in everytime. dont think you did that?
- another big change of v1 is to change the scheduler logic, which instead of distinguishing prefill and decode, you just output a number of # of tokens to compute (i.e., len = len(tokens) for prefill and 1 for decode). The goal was to make integration of chunkprefill (len=chunk_size) and speculative decoding (len=n_spec_tokens) to be super smooth. Dont think you did that either?
- v1 move the controller thread to cpu, and make metadata update only when something changes instead of passing it in everytime. dont think you did that?
- another big change of v1 is to change the scheduler logic, which instead of distinguishing prefill and decode, you just output a number of # of tokens to compute (i.e., len = len(tokens) for prefill and 1 for decode). The goal was to make integration of chunkprefill (len=chunk_size) and speculative decoding (len=n_spec_tokens) to be super smooth. Dont think you did that either?
Yes! You are mostly right! I did barely change on the scheduler of the repo. That are the changes you mentioned located in. The repo is aimed to help to people to learn how the new multi-threading style frontend of the v1 engine works. So I did quite a lot simplifying on the code running processes. If you really know the vllm codes, you will know that there are more different from that, like resource recycling, processes start up and so on. And in case you or others want to know how those are finished, the codes are in schedule and update_from_output method in v1 scheduler class. Thanks for pointing those out.