Video-LLaVA icon indicating copy to clipboard operation
Video-LLaVA copied to clipboard

推理bug

Open 827648313 opened this issue 1 year ago • 12 comments

加载模型后执行,第一次推理没问题,同样的参数第二次推理,就会报错

另外:咱们是否有官方技术交流群,可以讨论一下呢

827648313 avatar Feb 05 '24 07:02 827648313

是否有示例代码能够帮我们复现这个错误呢?我对这个“第二次推理”比较困惑。 [En] Any code to help us reproduce the error?

LinB203 avatar Feb 05 '24 08:02 LinB203

import torch from llava.constants import X_TOKEN_INDEX, DEFAULT_X_TOKEN from llava.conversation import conv_templates, SeparatorStyle from llava.model.builder import load_pretrained_model from llava.utils import disable_torch_init from llava.mm_utils import tokenizer_X_token, get_model_name_from_path, KeywordsStoppingCriteria

def main(): disable_torch_init() video = 'llava/serve/examples/sample_demo_1.mp4' inp = 'Why is this video funny?' model_path = 'LanguageBind/Video-LLaVA-7B' device = 'cuda' load_4bit, load_8bit = True, False model_name = get_model_name_from_path(model_path) tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name, load_8bit, load_4bit, device=device) video_processor = processor['video'] conv_mode = "llava_v1" conv = conv_templates[conv_mode].copy() roles = conv.roles

for i in range(2):
    video_tensor = video_processor(video, return_tensors='pt')['pixel_values']
    if type(video_tensor) is list:
        tensor = [video.to(model.device, dtype=torch.float16) for video in video_tensor]
    else:
        tensor = video_tensor.to(model.device, dtype=torch.float16)
    key = ['video']

    print(f"{roles[1]}: {inp}")
    inp = DEFAULT_X_TOKEN['VIDEO'] + '\n' + inp
    conv.append_message(conv.roles[0], inp)
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()
    input_ids = tokenizer_X_token(prompt, tokenizer, X_TOKEN_INDEX['VIDEO'], return_tensors='pt').unsqueeze(0).cuda()
    stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2
    keywords = [stop_str]
    stopping_criteria = KeywordsStoppingCriteria(keywords, tokenizer, input_ids)

    with torch.inference_mode():
        output_ids = model.generate(
            input_ids,
            images=[tensor, key],
            do_sample=True,
            temperature=0.1,
            max_new_tokens=1024,
            use_cache=True,
            stopping_criteria=[stopping_criteria])

    outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip()
    print(outputs)

if name == 'main':

main()`

以上这是代码信息,这是咱们的代码,我只是加载了一个循环

微信图片_20240205174221

827648313 avatar Feb 05 '24 09:02 827648313

上面执行的是Inference_languagebind_video.py文件,我对 llava/serve/cli.py做了一下修改去验证加载一次模型,进行两次推理问题的验证,以下是我的修改内容,及其报错信息 微信图片_20240205181129

报错信息如下

`Human: what is this Assistant: ['video'] The video shows a young child sitting on a bed and reading a book. The child is wearing glasses and appears to be enjoying the book.

Human: what isd this Assistant: ['video'] Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Agent_service/Video_LLaVA_main/llava/serve/cli.py", line 153, in main(args) File "/Agent_service/Video_LLaVA_main/llava/serve/cli.py", line 121, in main output_ids = model.generate( File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1588, in generate return self.sample( File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2642, in sample outputs = self( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/Agent_service/Video_LLaVA_main/llava/model/language_model/llava_llama.py", line 74, in forward input_ids, attention_mask, past_key_values, inputs_embeds, labels = self.prepare_inputs_labels_for_multimodal(input_ids, attention_mask, past_key_values, labels, images) File "/Agent_service/Video_LLaVA_main/llava/model/llava_arch.py", line 374, in prepare_inputs_labels_for_multimodal cur_X_features = X_features[cur_X_idx] IndexError: list index out of range ` 微信图片_20240205180316

827648313 avatar Feb 05 '24 10:02 827648313

您正在使用旧代码。我们已经更新了代码,您可以下载最新代码再试试么? [En] You are using the old code. We have update the code, could you pull the latest code and try it again?

LinB203 avatar Feb 05 '24 10:02 LinB203

用最新代码还是出现同样的问题,有解决方法吗 而且我想对一个视频提问多个问题,也是出现这个问题

cfwin avatar Feb 13 '24 04:02 cfwin

不明确为什么要使用循环。CLI推理代码可以实现多轮对话。 [En] Why use for-loop to inference? We can complete multi-round conversation with CLI demo. https://github.com/PKU-YuanGroup/Video-LLaVA?tab=readme-ov-file#cli-inference

LinB203 avatar Feb 13 '24 08:02 LinB203

I think it would be beneficial if one can call the model without problem in a Python script in order to perform downstream tasks.

Edit: I just solved the problem. You need to re-initialize the conv_templates object in every loop, since conv keeps all conversation history, so if you simply append the next question to it, the questions will accumulate in every loop, and this somehow causes problem for the model.

Coronal-Halo avatar Feb 14 '24 16:02 Coronal-Halo

不明确为什么要使用循环。CLI推理代码可以实现多轮对话。 [En] Why use for-loop to inference? We can complete multi-round conversation with CLI demo. https://github.com/PKU-YuanGroup/Video-LLaVA?tab=readme-ov-file#cli-inference

因为暂时只想实现单轮对话 哈哈

827648313 avatar Feb 18 '24 08:02 827648313

I think it would be beneficial if one call the model without problem in a Python script in order to perform downstream tasks.

Edit: I just solved the problem. You need to re-initialize the conv_templates object in every loop, since conv keeps all conversation history, so if you simply append the next question to it, the questions will accumulate in every loop, and this somehow causes problem for the model.

Exactly, just using a new conv template in every for loop.

LinB203 avatar Feb 18 '24 08:02 LinB203

我认为,如果在 Python 脚本中毫无问题地调用模型以执行下游任务,那将是有益的。 编辑:我刚刚解决了这个问题。您需要conv_templates在每个循环中重新初始化该对象,因为conv保留了所有对话历史记录,因此如果您只是将下一个问题附加到它,问题将在每个循环中累积,这会以某种方式导致模型出现问题。

确切地说,只需在每个 for 循环中使用新的转换模板即可。 是的

827648313 avatar Feb 20 '24 03:02 827648313

I think it would be beneficial if one can call the model without problem in a Python script in order to perform downstream tasks.

Edit: I just solved the problem. You need to re-initialize the conv_templates object in every loop, since conv keeps all conversation history, so if you simply append the next question to it, the questions will accumulate in every loop, and this somehow causes problem for the model.

可以分享下 是如何初始化 conv_templates 吗?

827648313 avatar Feb 20 '24 08:02 827648313

I think it would be beneficial if one can call the model without problem in a Python script in order to perform downstream tasks. Edit: I just solved the problem. You need to re-initialize the conv_templates object in every loop, since conv keeps all conversation history, so if you simply append the next question to it, the questions will accumulate in every loop, and this somehow causes problem for the model.

可以分享下 是如何初始化 conv_templates 吗?

Simply put these two lines of codes inside the for-loop that you use to evaluate a different question every iteration:

conv = conv_templates[conv_mode].copy()
roles = conv.roles

ziyaosg avatar Aug 28 '24 07:08 ziyaosg