nano-vllm icon indicating copy to clipboard operation
nano-vllm copied to clipboard

nano vLLM v1 engine

Open difey opened this issue 4 months ago • 2 comments

Hi, nano vLLM with engine v1 is here!

Since v0.6.0 release of vLLM, there is a brand new engine backend with multiprocessing, as described in the official blog. I have implemented this in the nano-vLLM. You can check that from my repo. For now, it is not supported to switch 2 different engine with ENV easily.

difey avatar Aug 06 '25 09:08 difey

  • v1 move the controller thread to cpu, and make metadata update only when something changes instead of passing it in everytime. dont think you did that?
  • another big change of v1 is to change the scheduler logic, which instead of distinguishing prefill and decode, you just output a number of # of tokens to compute (i.e., len = len(tokens) for prefill and 1 for decode). The goal was to make integration of chunkprefill (len=chunk_size) and speculative decoding (len=n_spec_tokens) to be super smooth. Dont think you did that either?

MasterGodzilla avatar Sep 15 '25 21:09 MasterGodzilla

  • v1 move the controller thread to cpu, and make metadata update only when something changes instead of passing it in everytime. dont think you did that?
  • another big change of v1 is to change the scheduler logic, which instead of distinguishing prefill and decode, you just output a number of # of tokens to compute (i.e., len = len(tokens) for prefill and 1 for decode). The goal was to make integration of chunkprefill (len=chunk_size) and speculative decoding (len=n_spec_tokens) to be super smooth. Dont think you did that either?

Yes! You are mostly right! I did barely change on the scheduler of the repo. That are the changes you mentioned located in. The repo is aimed to help to people to learn how the new multi-threading style frontend of the v1 engine works. So I did quite a lot simplifying on the code running processes. If you really know the vllm codes, you will know that there are more different from that, like resource recycling, processes start up and so on. And in case you or others want to know how those are finished, the codes are in schedule and update_from_output method in v1 scheduler class. Thanks for pointing those out.

difey avatar Sep 20 '25 07:09 difey