Chuanhong Li comments

Results 6 comments of


                                            Chuanhong Li

[Speculative decoding] [Help wanted] [Performance] Optimize draft-model speculative decoding

> > In the Anyscale fork we saw a 50% speedup on bs=8 with a 68m-sized draft model on TP1/70B target model on TP8 and a 7B draft model on...

[Speculative decoding] [Help wanted] [Performance] Optimize draft-model speculative decoding

> Thanks for the information! Looking forward to the complete speculative decoding support! Thanks for your reply!

HH scores summed along batch dimension

@yeoedward @Ying1123 @Kyriection Hi，is there an answer for the above question? Besides，I also want to know when bathcing inference is used for llama, how to update the hh_socre?

TASK=xsum HH_SIZE=256 RECENT_SIZE=256 Model=llama-7b and the rouge2 of h2o is low

> @duyuxuan1486 hi! Have you ever encountered such an error? when bash scripts/streaming/eval.sh full > > from streaming_llm.utils import load, download_url, load_jsonl ModuleNotFoundError: No module named 'streaming_llm' https://github.com/FMInference/H2O/issues/8

HH scores summed along batch dimension

> Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by modifying...

HH scores summed along batch dimension

> > Hi, The HH scores should be sequence-independent. In this implementation, we use one sequence in each batch for testing. Will update the implementation for multi sequences shortly, by...