swift
swift copied to clipboard
Infer internlm-xcomposer2 lead to `ValueError: Input length of input_ids is 0, but `max_length` is set to -1066.`
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
Running finetuned internlm-xcomposer2-7b-chat
lead to error.
token len:history:113, query:1706
Traceback (most recent call last):
File "/home/ldl/pi_code/swift/pi_code/infer_internlm-xcomposer2.py", line 83, in <module>
response, _ = inference(model, template, value, history)
File "/home/ldl/pi_code/swift/swift/llm/utils/utils.py", line 692, in inference
generate_ids = model.generate(
File "/home/ldl/miniconda3/envs/swift/lib/python3.10/site-packages/peft/peft_model.py", line 1190, in generate
outputs = self.base_model.generate(*args, **kwargs)
File "/home/ldl/miniconda3/envs/swift/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ldl/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/generation/utils.py", line 1449, in generate
self._validate_generated_length(generation_config, input_ids_length, has_default_max_length)
File "/home/ldl/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/generation/utils.py", line 1140, in _validate_generated_length
raise ValueError(
ValueError: Input length of input_ids is 0, but `max_length` is set to -1066. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.
Also, running web-ui with the training data got wrong reply.
CUDA_VISIBLE_DEVICES=0 swift app-ui --share True --ckpt_dir ckp_output/internlm-xcomposer2-7b-chat/v10-20240502-202001/checkpoint-60/ --load_dataset_config true
I'm sorry,butIamunabletoanalyzetheimagesaslamatext-basedAlassistant and do not have the ability to view or interpret images.
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Additional context Add any other context about the problem here(在这里补充其他信息)
with CUDA_VISIBLE_DEVICES=0,1 swift infer --ckpt_dir ckp_output/internlm-xcomposer2-7b-chat/v10-20240502-202001/checkpoint-60/
it works, but the VMem usage goes from 18GB to 48+GB with only 4 questions(6pics+text) and lead to OOM? Is this a bug?
Change model.generation_config.max_new_tokens
solved the problem.
But I'm still confused about the difference of these params, can you help to explain it?
form https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/files
model.config.max_length
model.generation_config.max_length
model.generation_config.max_length
model.generation_config.max_new_tokens