ChatGLM-6B
ChatGLM-6B copied to clipboard
[BUG] evaluate时predict结果为空,
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
evaluate.sh内容: PRE_SEQ_LEN=128 CHECKPOINT=viewgen0421-chatglm-6b-pt-128-2e-2 STEP=5000
CUDA_VISIBLE_DEVICES=1 python3 main.py
--do_predict
--validation_file /home/workspace/data/dev.json
--test_file /home/workspace/data/dev.json
--overwrite_cache
--prompt_column content
--response_column summary
--model_name_or_path /home/workspace/chatglm/chatglm-6B
--ptuning_checkpoint ./output/$CHECKPOINT/checkpoint-$STEP
--output_dir ./output/$CHECKPOINT
--overwrite_output_dir
--max_source_length 512
--max_target_length 512
--per_device_eval_batch_size 1
--predict_with_generate
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4
日志输出warning: Input length of input_ids is 512, but max_length is set to 512. This can lead to unexpected behavior. You should consider increasing max_new_tokens.
导致 rouge.get_scores报错 ValueError: Hypothesis is empty. https://github.com/THUDM/ChatGLM-6B/blob/aeced3619b804d20d2396576f6d5bc8dc8226913/ptuning/main.py#L328
尝试调整max_length =1025 ,可以修复这个问题 https://github.com/THUDM/ChatGLM-6B/blob/aeced3619b804d20d2396576f6d5bc8dc8226913/ptuning/main.py#L397
请问这个原因是啥?
Expected Behavior
No response
Steps To Reproduce
evaluate.sh入参 --max_source_length 512 --max_target_length 512 可以触发
Environment
- OS: centos 8
- Python:3.9
- Transformers:4.26.1
- PyTorch:1.12
- CUDA Support True
Anything else?
No response
+1
另发现padding较多也会输出为空。
I met the same question!
同问!! 想知道PRE_SEQ_LEN、max_source_length和max_traget_length的关系是什么?
There are null values in the data, just clean up the data.
There are null values in the data, just clean up the data. I checked that there are no null values in my data.
In my case, the model prediction hypothesis
only contains one newline character, which causes the rouge calculation error, so we need to judge the model output and skip the empty output.
To solve this problem, you need to change ptuning/main.py#L327 to the following code:
hypothesis = ' '.join(hypothesis)
reference = ' '.join(reference)
if not hypothesis.strip() or not reference.strip():
continue
scores = rouge.get_scores(hypothesis , reference)
同样遇到。
明显的bug,这里eval和predict的长度应该和train.sh的参数保持一致,否则tokenizer有问题,推理后解码出来全是