vllm
vllm copied to clipboard
[V1][Spec Decode] Non greedy sample with EAGLE / Reduce memory allocation for Rejection Sampler
Task 3 of #15901 Enable non greedy sampling in Eagle
- [x] avoid intermediate tensor storage for RS buffers which was happening with torch.stack
- [x] Cache draft_probs inside the model runner and correctly feed it to the rejection sampler in the next step
This PR makes changes to proposer() interface where instead of outputting the draft_token_ids or their prob, it saves them in the internal variable is accessed using getters like get_draft_token_ids() and get_draft_probs()
👋 Hi! Thank you for contributing to the vLLM project.
💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.
Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.
To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.
🚀
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ekagra-ranjan.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ekagra-ranjan.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Also, can we have a test for this PR? Appreciate it!
Update: I updated the draft prob buffer layout to be packed. I tested the AL with my metric hack https://github.com/ekagra-ranjan/vllm/pull/1 on MTBench (same setup)
With K=2
- temp=0, AL= 1.90 (prev bench on above setup was 1.89 so it matches)
- temp=0.3, AL=1.89
- temp=0.75, AL= 1.86
- temp=1, AL=1.82
so it seems to work. Next, I will add some unittest.
I added non greedy sampling test to the test_eagle_correctness. However they are failing for T!=0. The number of exact match drops from 100 to 20 when T changes from 0 to 1.0. For T=0.3, 0.75, the matches are around 40. Even ngram test fails when I add non greedy sampling paramter to test. I am not sure if this is a robust way to test.
I also logged the output for the sample prompt and compared SD (eagle) with vanilla output on various Temp below. It shows that while T=0 has exact match, T!=0 does not have exact match bw vanilla and SD however the answer is coherent.
Output for various Temp (0, 0.3, 0.75, 1.0)
0:
SD:
[
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe future of AI is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "What a fascinating topic! The future of AI is likely to be shaped by various factors, including technological advancements, societal needs, and ethical considerations. Here are some potential developments that could shape the future of AI:\n\n1. **Increased Adoption**: AI will become more ubiquitous, with applications in various industries, such as healthcare, finance, education, and transportation.\n2. **Advancements in Machine Learning**: Machine learning algorithms will continue to improve, enabling AI systems to learn from data, adapt to new situations, and make more accurate predictions.\n3. **Natural Language Processing (NLP)**: NLP will become more sophisticated, allowing humans to interact with AI systems using natural language, and enabling AI-powered chatbots and virtual assistants to understand and respond to complex queries.\n4. **Computer Vision**: Computer vision will continue to improve, enabling AI systems to interpret and understand visual data, such as images and videos, and apply this knowledge to various applications, such as self-driving cars and medical diagnosis.\n5. **Edge AI**: Edge AI will become more prevalent, enabling AI processing to occur closer to the source of the data, reducing latency and improving real-time decision-making.\n6. **Explainability and Transparency**: As AI becomes more pervasive, there will be a growing need for"
},
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe president of the United States is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "As of my knowledge cutoff, the President of the United States is Joe Biden."
}
]
vanilla:
[
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe future of AI is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "What a fascinating topic! The future of AI is likely to be shaped by various factors, including technological advancements, societal needs, and ethical considerations. Here are some potential developments that could shape the future of AI:\n\n1. **Increased Adoption**: AI will become more ubiquitous, with applications in various industries, such as healthcare, finance, education, and transportation.\n2. **Advancements in Machine Learning**: Machine learning algorithms will continue to improve, enabling AI systems to learn from data, adapt to new situations, and make more accurate predictions.\n3. **Natural Language Processing (NLP)**: NLP will become more sophisticated, allowing humans to interact with AI systems using natural language, and enabling AI-powered chatbots and virtual assistants to understand and respond to complex queries.\n4. **Computer Vision**: Computer vision will continue to improve, enabling AI systems to interpret and understand visual data, such as images and videos, and apply this knowledge to various applications, such as self-driving cars and medical diagnosis.\n5. **Edge AI**: Edge AI will become more prevalent, enabling AI processing to occur closer to the source of the data, reducing latency and improving real-time decision-making.\n6. **Explainability and Transparency**: As AI becomes more pervasive, there will be a growing need for"
},
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe president of the United States is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "As of my knowledge cutoff, the President of the United States is Joe Biden."
}
]
0.3
SD:
[
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe future of AI is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "What a fascinating topic! The future of AI is likely to be shaped by various factors, including technological advancements, societal needs, and ethical considerations. Here are some potential developments that could shape the future of AI:\n\n1. **Increased Adoption**: AI is expected to become even more ubiquitous, with applications in various industries, such as healthcare, finance, transportation, and education.\n2. **Advancements in Machine Learning**: Machine learning algorithms will continue to improve, enabling AI systems to learn from data, adapt to new situations, and make more accurate predictions.\n3. **Natural Language Processing (NLP)**: NLP will become more sophisticated, allowing humans to interact with AI systems more naturally, using voice commands, text, or gestures.\n4. **Computer Vision**: Computer vision will continue to improve, enabling AI systems to interpret and understand visual data, such as images and videos, with greater accuracy.\n5. **Edge AI**: Edge AI, which involves processing data closer to the source, will become more prevalent, reducing latency and improving real-time decision-making.\n6. **Explainability and Transparency**: As AI systems become more complex, there will be a growing need for explainability and transparency, to ensure accountability and trust.\n7. **Human-AI Collaboration**: AI will"
},
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe president of the United States is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "As of my knowledge cutoff in 2021, the President of the United States is Joe Biden."
}
]
vanilla:
[
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe future of AI is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "A topic that sparks much excitement and curiosity! The future of AI is likely to be shaped by various factors, including technological advancements, societal needs, and ethical considerations. Here are some potential developments that could shape the future of AI:\n\n1. **Increased Adoption**: AI will become more ubiquitous, with applications in various industries, such as healthcare, finance, education, and transportation.\n2. **Advancements in Machine Learning**: Machine learning algorithms will continue to improve, enabling AI systems to learn from data, adapt to new situations, and make more accurate predictions.\n3. **Natural Language Processing (NLP)**: NLP will become more sophisticated, allowing humans to interact with AI systems more naturally, using voice commands, text, or gestures.\n4. **Computer Vision**: Computer vision will improve, enabling AI systems to interpret and understand visual data, such as images and videos, with greater accuracy.\n5. **Edge AI**: AI will be deployed at the edge of the network, closer to the source of the data, to reduce latency and improve real-time decision-making.\n6. **Explainable AI**: As AI becomes more pervasive, there will be a growing need to understand how AI systems make decisions, leading to the development of explainable AI (XAI) techniques.\n"
},
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe president of the United States is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "As of my knowledge cutoff, the President of the United States is Joe Biden."
}
]
0.75
SD:
[
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe future of AI is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "What a fascinating topic! The future of Artificial Intelligence (AI) is expected to be shaped by numerous factors, including advancements in technology, human innovation, and societal needs. Here are some potential developments that could shape the future of AI:\n\n1. **Increased Adoption**: AI is expected to become even more ubiquitous, with applications in various industries, such as healthcare, finance, transportation, and education.\n2. **Advances in Machine Learning**: Machine learning, a subset of AI, will continue to improve, enabling machines to learn from data, make decisions, and adapt to new situations.\n3. **Natural Language Processing (NLP)**: NLP will become more sophisticated, allowing humans to interact with AI systems more naturally, using voice commands, text, or gestures.\n4. **Computer Vision**: Computer vision will continue to improve, enabling AI systems to interpret and understand visual data, such as images and videos.\n5. **Robotics and Robotics Intelligence (RI)**: RI will become more prevalent, enabling robots to perform tasks that require intelligence, dexterity, and flexibility.\n6. **Autonomous Systems**: Autonomous vehicles, drones, and robots will become more common, transforming industries like logistics, transportation, and manufacturing.\n7. **Explainability and Transparency**: As AI"
},
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe president of the United States is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "As of my knowledge cutoff in August 2022, the President of the United States is Joe Biden."
}
]
vanilla:
[
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe future of AI is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "What a fascinating topic!\n\nThe future of AI is inherently uncertain, but based on current trends, research, and expert opinions, here are some potential developments that could shape the future of AI:\n\n1. **Artificial General Intelligence (AGI)**: AGI is AI that possesses human-like intelligence, able to reason, learn, and apply knowledge across various tasks. If achieved, AGI could revolutionize many industries and aspects of our lives.\n2. **Explainable AI (XAI)**: As AI becomes more prevalent, there will be a growing need for transparency and explainability. XAI will help users understand AI decision-making processes, increasing trust and accountability.\n3. **Edge AI**: The proliferation of IoT devices and the need for real-time processing will drive the development of Edge AI, which processes data closer to where it's generated, reducing latency and improving efficiency.\n4. **Multimodal AI**: AI will become more adept at processing and understanding various forms of data, such as images, videos, audio, and natural language, enabling more effective human-machine interaction.\n5. **Autonomous Systems**: Autonomous vehicles, drones, and robots will become increasingly common, transforming industries like logistics, transportation, and healthcare.\n6. **AI-Powered Medicine**: AI"
},
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe president of the United States is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "Joe Biden"
}
]
1.0
SD:
[
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe future of AI is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "What a great topic!\n\nThe future of AI is full of uncertainty andpromise. Here are some potential developments that could shape the future of AI:\n\n1. **Advancements in Machine Learning**: Machine learning is the foundation of many AI systems. Future advancements in this area could lead to even more sophisticated AI models that can learn faster, better, and more efficiently.\n2. **Edge AI**: With the rise of IoT devices and edge computing, we can expect to see more AI processing happening at the edge, closer to the source of the data, rather than in the cloud.\n3. **Explainable AI (XAI)**: As AI becomes more pervasive, there is a growing need for transparency and explainability. XAI could help us understand how AI models make decisions and reduce the risk of bias and misinformation.\n4. **Human-AI collaboration**: AI will increasingly be used to augment human capabilities, rather than replace them. This could lead to new forms of collaboration between humans and AI, enabling us to work more efficiently and effectively.\n5. **Natural Language Processing (NLP)**: NLP is a key area of AI research, enabling computers to understand and process human language. Expect to see significant advancements in this area, leading to more conversational AI interfaces.\n"
},
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe president of the United States is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "As of my knowledge cutoff in August 2022, the 46th and current President of the United States is Joe Biden."
}
]
vanilla:
[
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe future of AI is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "The future of AI is a topic of much debate and speculation! As AI continues to evolve, we can expect to see significant advancements in several areas. Here are some potential developments that could shape the future of AI:\n\n1. **Increased Adoption**: AI will become more ubiquitous, with applications in various industries, including healthcare, finance, education, and more. Expect to see more AI-powered chatbots, virtual assistants, and automated decision-making systems.\n2. **Explainability and Transparency**: As AI systems become more sophisticated, there will be a greater emphasis on explainability and transparency. Researchers will focus on developing methods to understand and interpret AI decision-making processes, improving trust and accountability.\n3. **Edge AI**: The proliferation of IoT devices and edge computing will lead to the development of AI algorithms that can process data in real-time, closer to the source, reducing latency and improving performance.\n4. **Meta-Learning**: AI systems will learn to learn, adapting to new tasks and environments more efficiently. This could revolutionize areas like robotics, gaming, and natural language processing.\n5. **Human-AI Collaboration**: As AI becomes more capable, humans and machines will work together more closely. Expect to see AI-assisted creative work, such as art, music, and writing"
},
{
"input": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nThe president of the United States is<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
"output": "I'm happy to help! However, I must clarify that the presidency is a dynamic position and is likely to change every four years or sooner due to elections or other circumstances.\n\nAs of my last update (January 2021), the 46th and current President of the United States is Joe Biden. If you're looking for the most current information, I recommend checking out reputable news sources or the official White House website for the latest updates.\n\nIf you have any specific questions or topics related to the presidency or current events, I'm here to help!"
}
]
@LiuXiaoxuanPKU could you share your thoughts on how was non greedy tested in V0 OR how should we test this PR? Thanks for the review!
I noticed is that the logic handling the draft probability is implemented inside the eagle. IIUC, this logic is general for all spec methods (e.g., medusa) so I think it should be implemented as a shared utility.
@WoosukKwon - Got it, will make the change. Could you confirm if you are referring to the already existing compute_probs_and_sample_next_token() and the newly added draft buffers OR something else too?
This pull request has merge conflicts that must be resolved before it can be merged. Please rebase the PR, @ekagra-ranjan.
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
Just want to check in. What's the status of this PR? Since it is mentioned (https://github.com/vllm-project/vllm/issues/15901#issue-2964425133) , we still should properly handle the non-argmax sampling parameters.
@wwl2755 I did some test and am waiting for @LiuXiaoxuanPKU to share if the result look fine. Once confirmed, I will bring this PR upto date with the recent changes that have happened.
superseeded by: https://github.com/vllm-project/vllm/pull/20459