GUO-QING JIANG

Results 8 issues of GUO-QING JIANG

Maybe using pandas as csv tutorial will be better?

**When we tried to perfrom the Qs in the appendix of the llama paper, we found that it was just repeating... Anything needs to adjust? top_p? temperature?** **Q1: The sun...

Small LLMs trained using FP8 with 32 GPUs can achieve 20~30% speed up comparing with bf16. However, scaling up to 1000+ GPUs only achieve less than 5% speed up (TP2...

Is Medusa1 model generalize token-wise the same as the base model w.o. medusa head? I found change medusa choices will change the output.

I checked the TRT-LLM but found something confusing. There are some features not supported: 1. inferece batch size == 1, (seemed solved recently) 2. not surport in-flight batching, which will...

This setup can not pass UT. Could you please check it ?

Hi EAGLE Team, Thanks for your great work that accelerates speculative decoding up to unbelievable 4~5 times. But I can't reproduce the result of below: Acc rate improves with more...