ChatGLM-6B [BUG] evaluate时predict结果为空，

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

evaluate.sh内容： PRE_SEQ_LEN=128 CHECKPOINT=viewgen0421-chatglm-6b-pt-128-2e-2 STEP=5000

CUDA_VISIBLE_DEVICES=1 python3 main.py
--do_predict
--validation_file /home/workspace/data/dev.json
--test_file /home/workspace/data/dev.json
--overwrite_cache
--prompt_column content
--response_column summary
--model_name_or_path /home/workspace/chatglm/chatglm-6B
--ptuning_checkpoint ./output/$CHECKPOINT/checkpoint-$STEP
--output_dir ./output/$CHECKPOINT
--overwrite_output_dir
--max_source_length 512
--max_target_length 512
--per_device_eval_batch_size 1
--predict_with_generate
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4

日志输出warning: Input length of input_ids is 512, but max_length is set to 512. This can lead to unexpected behavior. You should consider increasing max_new_tokens.

导致 rouge.get_scores报错 ValueError: Hypothesis is empty. https://github.com/THUDM/ChatGLM-6B/blob/aeced3619b804d20d2396576f6d5bc8dc8226913/ptuning/main.py#L328

尝试调整max_length =1025 ，可以修复这个问题 https://github.com/THUDM/ChatGLM-6B/blob/aeced3619b804d20d2396576f6d5bc8dc8226913/ptuning/main.py#L397

请问这个原因是啥？

Expected Behavior

No response

Steps To Reproduce

evaluate.sh入参 --max_source_length 512 --max_target_length 512 可以触发

Environment

- OS: centos 8
- Python:3.9
- Transformers:4.26.1
- PyTorch:1.12
- CUDA Support True

Anything else?

No response

Apr 24 '23 08:04 micrazy

+1

Apr 25 '23 06:04 michelleqyhqyh

另发现padding较多也会输出为空。

Apr 26 '23 07:04 luolanfeixue

I met the same question!

May 05 '23 07:05 cowarder

同问！！想知道PRE_SEQ_LEN、max_source_length和max_traget_length的关系是什么？

May 05 '23 09:05 LOGIC-10

There are null values in the data, just clean up the data.

May 22 '23 06:05 Chiang97912

There are null values in the data, just clean up the data. I checked that there are no null values in my data.

May 22 '23 06:05 micrazy

In my case, the model prediction hypothesis only contains one newline character, which causes the rouge calculation error, so we need to judge the model output and skip the empty output. To solve this problem, you need to change ptuning/main.py#L327 to the following code：

            hypothesis = ' '.join(hypothesis)
            reference = ' '.join(reference)
            if not hypothesis.strip() or not reference.strip():
                continue
            scores = rouge.get_scores(hypothesis , reference)

May 22 '23 08:05 Chiang97912

同样遇到。明显的bug，这里eval和predict的长度应该和train.sh的参数保持一致，否则tokenizer有问题，推理后解码出来全是

May 25 '23 10:05 insist93

ChatGLM-6B ChatGLM-6B copied to clipboard

[BUG] evaluate时predict结果为空，

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM-6B
ChatGLM-6B copied to clipboard