xiaoqi

Results 2 issues of xiaoqi

# What Does This PR Do? FIX #7454 #6818 #6614 I am using vllm and Qwen2-72B-Instruct model to do performance test of speculative decode-ngram algorithm. When I set: speculative_model="[ngram]", num_speculative_tokens=5,...

# Support train eagle3 by deepspeed for large model like 72B/235B