Video-LLaVA
Video-LLaVA copied to clipboard
推理bug
加载模型后执行,第一次推理没问题,同样的参数第二次推理,就会报错
另外:咱们是否有官方技术交流群,可以讨论一下呢
是否有示例代码能够帮我们复现这个错误呢?我对这个“第二次推理”比较困惑。 [En] Any code to help us reproduce the error?
import torch from llava.constants import X_TOKEN_INDEX, DEFAULT_X_TOKEN from llava.conversation import conv_templates, SeparatorStyle from llava.model.builder import load_pretrained_model from llava.utils import disable_torch_init from llava.mm_utils import tokenizer_X_token, get_model_name_from_path, KeywordsStoppingCriteria
def main(): disable_torch_init() video = 'llava/serve/examples/sample_demo_1.mp4' inp = 'Why is this video funny?' model_path = 'LanguageBind/Video-LLaVA-7B' device = 'cuda' load_4bit, load_8bit = True, False model_name = get_model_name_from_path(model_path) tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name, load_8bit, load_4bit, device=device) video_processor = processor['video'] conv_mode = "llava_v1" conv = conv_templates[conv_mode].copy() roles = conv.roles
for i in range(2):
video_tensor = video_processor(video, return_tensors='pt')['pixel_values']
if type(video_tensor) is list:
tensor = [video.to(model.device, dtype=torch.float16) for video in video_tensor]
else:
tensor = video_tensor.to(model.device, dtype=torch.float16)
key = ['video']
print(f"{roles[1]}: {inp}")
inp = DEFAULT_X_TOKEN['VIDEO'] + '\n' + inp
conv.append_message(conv.roles[0], inp)
conv.append_message(conv.roles[1], None)
prompt = conv.get_prompt()
input_ids = tokenizer_X_token(prompt, tokenizer, X_TOKEN_INDEX['VIDEO'], return_tensors='pt').unsqueeze(0).cuda()
stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2
keywords = [stop_str]
stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)
with torch.inference_mode():
output_ids = model.generate(
input_ids,
images=[tensor, key],
do_sample=True,
temperature=0.1,
max_new_tokens=1024,
use_cache=True,
stopping_criteria=[stopping_criteria])
outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip()
print(outputs)
if name == 'main':
main()`
以上这是代码信息,这是咱们的代码,我只是加载了一个循环
上面执行的是Inference_languagebind_video.py文件,我对 llava/serve/cli.py做了一下修改去验证加载一次模型,进行两次推理问题的验证,以下是我的修改内容,及其报错信息
报错信息如下
`Human: what is this Assistant: ['video'] The video shows a young child sitting on a bed and reading a book. The child is wearing glasses and appears to be enjoying the book.
Human: what isd this
Assistant: ['video']
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Agent_service/Video_LLaVA_main/llava/serve/cli.py", line 153, in
您正在使用旧代码。我们已经更新了代码,您可以下载最新代码再试试么? [En] You are using the old code. We have update the code, could you pull the latest code and try it again?
用最新代码还是出现同样的问题,有解决方法吗 而且我想对一个视频提问多个问题,也是出现这个问题
不明确为什么要使用循环。CLI推理代码可以实现多轮对话。 [En] Why use for-loop to inference? We can complete multi-round conversation with CLI demo. https://github.com/PKU-YuanGroup/Video-LLaVA?tab=readme-ov-file#cli-inference
I think it would be beneficial if one can call the model without problem in a Python script in order to perform downstream tasks.
Edit: I just solved the problem. You need to re-initialize the conv_templates
object in every loop, since conv
keeps all conversation history, so if you simply append the next question to it, the questions will accumulate in every loop, and this somehow causes problem for the model.
不明确为什么要使用循环。CLI推理代码可以实现多轮对话。 [En] Why use for-loop to inference? We can complete multi-round conversation with CLI demo. https://github.com/PKU-YuanGroup/Video-LLaVA?tab=readme-ov-file#cli-inference
因为暂时只想实现单轮对话 哈哈
I think it would be beneficial if one call the model without problem in a Python script in order to perform downstream tasks.
Edit: I just solved the problem. You need to re-initialize the
conv_templates
object in every loop, sinceconv
keeps all conversation history, so if you simply append the next question to it, the questions will accumulate in every loop, and this somehow causes problem for the model.
Exactly, just using a new conv template in every for loop.
我认为,如果在 Python 脚本中毫无问题地调用模型以执行下游任务,那将是有益的。 编辑:我刚刚解决了这个问题。您需要
conv_templates
在每个循环中重新初始化该对象,因为conv
保留了所有对话历史记录,因此如果您只是将下一个问题附加到它,问题将在每个循环中累积,这会以某种方式导致模型出现问题。确切地说,只需在每个 for 循环中使用新的转换模板即可。 是的
I think it would be beneficial if one can call the model without problem in a Python script in order to perform downstream tasks.
Edit: I just solved the problem. You need to re-initialize the
conv_templates
object in every loop, sinceconv
keeps all conversation history, so if you simply append the next question to it, the questions will accumulate in every loop, and this somehow causes problem for the model.
可以分享下 是如何初始化 conv_templates 吗?
I think it would be beneficial if one can call the model without problem in a Python script in order to perform downstream tasks. Edit: I just solved the problem. You need to re-initialize the
conv_templates
object in every loop, sinceconv
keeps all conversation history, so if you simply append the next question to it, the questions will accumulate in every loop, and this somehow causes problem for the model.可以分享下 是如何初始化 conv_templates 吗?
Simply put these two lines of codes inside the for-loop that you use to evaluate a different question every iteration:
conv = conv_templates[conv_mode].copy()
roles = conv.roles