Chuanhong Li

Results 6 comments of Chuanhong Li

> > In the Anyscale fork we saw a 50% speedup on bs=8 with a 68m-sized draft model on TP1/70B target model on TP8 and a 7B draft model on...

> Thanks for the information! Looking forward to the complete speculative decoding support! Thanks for your reply!

@yeoedward @Ying1123 @Kyriection Hi,is there an answer for the above question? Besides,I also want to know when bathcing inference is used for llama, how to update the hh_socre?

> @duyuxuan1486 hi! Have you ever encountered such an error? when bash scripts/streaming/eval.sh full > > from streaming_llm.utils import load, download_url, load_jsonl ModuleNotFoundError: No module named 'streaming_llm' https://github.com/FMInference/H2O/issues/8

> Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by modifying...

> > Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by...