SherrySwift

Results 10 comments of SherrySwift

Hi, Thanks for sharing the codes! Could you please share the pre-trained model as well? Thx

Hi, I meet the same problem. Have you solved this problem?

Hi, I used huggyllama/llama-7b, but I encounterd the following errors when I try to run scripts/summarization/eval.sh: ``` Traceback (most recent call last): File "/data1/H2O-main/h2o_hf/run_summarization.py", line 138, in output_sequences = model.generate(...

Thanks for your reply. Here is the command: `bash scripts/summarization/eval.sh xsum 5 full 0` The contents in scripts/summarization/eval.sh are: ``` task=$1 shots=$2 method=$3 GPU=$4 HH_SIZE=$5 RECENT_SIZE=$6 if [[ ${method} ==...

by the way, the above error also occur in the middle of evaluation when I use other models (such as llama-2-7b) Here is part of the log: ``` The attention...

Thanks for your patience, but specify "tokenizer.pad_token_id=tokenizer.eos_token_id" still cannot solve the problem. Since I couldn't come up with a better solution, I just skip the sample 797 in the end....

Sorry to bother you again. In h2o_hf/data directory, there are several different jsonl files for xsum dataset. In order to reproduce the result in Figure 4 in paper (i.e. Rouge-2...

你好,请问latex文件里的model为什么显示gpt-neox-20b呀,不是测llama-7b的结果么

Hi, is there any plan to integrate the 4-bit fused dequantize+attention operators proposed in Atom into FlashInfer? Looking forward for this new feature.

I found NaN norm during training, too. I guess it is caused by AMP training.