xiaoqi
Results
2
issues of
xiaoqi
# What Does This PR Do? FIX #7454 #6818 #6614 I am using vllm and Qwen2-72B-Instruct model to do performance test of speculative decode-ngram algorithm. When I set: speculative_model="[ngram]", num_speculative_tokens=5,...
# Support train eagle3 by deepspeed for large model like 72B/235B