SherrySwift comments

Results 10 comments of


                                            SherrySwift

Pretrained weights

Hi, Thanks for sharing the codes! Could you please share the pre-trained model as well? Thx

TASK=xsum HH_SIZE=256 RECENT_SIZE=256 Model=llama-7b and the rouge2 of h2o is low

Hi, I meet the same problem. Have you solved this problem?

Question about the reproduction of XSUM results

Hi, I used huggyllama/llama-7b, but I encounterd the following errors when I try to run scripts/summarization/eval.sh: ``` Traceback (most recent call last): File "/data1/H2O-main/h2o_hf/run_summarization.py", line 138, in output_sequences = model.generate(...

Question about the reproduction of XSUM results

Thanks for your reply. Here is the command: `bash scripts/summarization/eval.sh xsum 5 full 0` The contents in scripts/summarization/eval.sh are: ``` task=$1 shots=$2 method=$3 GPU=$4 HH_SIZE=$5 RECENT_SIZE=$6 if [[ ${method} ==...

Question about the reproduction of XSUM results

by the way, the above error also occur in the middle of evaluation when I use other models (such as llama-2-7b) Here is part of the log: ``` The attention...

Question about the reproduction of XSUM results

Thanks for your patience, but specify "tokenizer.pad_token_id=tokenizer.eos_token_id" still cannot solve the problem. Since I couldn't come up with a better solution, I just skip the sample 797 in the end....

Question about the reproduction of XSUM results

Sorry to bother you again. In h2o_hf/data directory, there are several different jsonl files for xsum dataset. In order to reproduce the result in Figure 4 in paper (i.e. Rouge-2...

在运行helm的xsum的时候（llama-7b），local出来的结果accuracy是空的

你好，请问latex文件里的model为什么显示gpt-neox-20b呀，不是测llama-7b的结果么

How to use low-bit KV Cache in flashinfer?

Hi, is there any plan to integrate the 4-bit fused dequantize+attention operators proposed in Atom into FlashInfer? Looking forward for this new feature.

Questions about quantization

I found NaN norm during training, too. I guess it is caused by AMP training.