Ask-Anything issues

Disk Space, GPU Usage, and Training Duration for Stage3

Hello! Thanks for your excellent work on the VideoChat2 model and code sharing. I have a few questions regarding the "stage3" training phase of the model and hope you could...

myue22

您好，急问为什么执行inference demo时，对一定数量的视频生成caption之后程序就会卡死

我把所有视频放在列表里面去遍历，依次生成caption. 但十几个视频之后程序就会卡在输出caption的answer那里。请问这是为什么呢？非常感谢！

tiesanguaixia

Question about attention mask

According to the code, it seems that causal masking is applied also in the visual queries in Stage 2 and 3. Is there any reason for this implementation?

ikodoh

Question about the output scores in videochat2

Hello, I would like to ask if the score of each token is returned in stage 3, such as the code of Video LLAVA. ![image](https://github.com/OpenGVLab/Ask-Anything/assets/35141692/536c5ffc-fa6a-419b-a934-206bcf4874e3)

JoaquinChou

Any instructions for fine-tuning on custom datasets?

1

Thanks for your excellent work. I am curious if there are any instructions for fine-tuning video-llava on my own dataset?

2000ZRL

Dear author, How much time does it cost to train this model？ With what type of GPU cards?

4

zhangyuereal

第一个step后第三阶段loss变为nan

4

这个我stage3配置信息 ``` from configs.instruction_data import * # ========================= data ========================== train_corpus = "videochat2_instruction" train_file = "${available_corpus[${train_corpus}]}" # for lazy evaluation # import pdb;pdb.set_trace() # train_file = available_corpus[train_corpus] test_file = dict()...

YYY-MMW

The final fine-tuned vision encoder

1

Your tables [here](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/README.md#parrot-videochat2) explains very well how to fine-tune the model step by step. You also have some of the checkpoints along the way. But I cannot find the final...

mojtaba-komeili

Question about training stage3 in videochat2

3

Hello! First of all, thank you for your great work on the videochat2 model. I have a question about the training part in stage3, particularly in line 274 of the...

bexxnaz

Evaluation results on MVBench different from the paper

7

Hi, I have tested the VideoChat2 model on my server and found that the test results are different from the paper. My results are listed as follows: {"Action Sequence": 66.0,...

emmating12

Ask-Anything
Ask-Anything copied to clipboard

Metadata

Disk Space, GPU Usage, and Training Duration for Stage3

您好，急问为什么执行inference demo时，对一定数量的视频生成caption之后程序就会卡死

Question about attention mask

Question about the output scores in videochat2

Any instructions for fine-tuning on custom datasets?

Dear author, How much time does it cost to train this model？ With what type of GPU cards?

第一个step后第三阶段loss变为nan

The final fine-tuned vision encoder

Question about training stage3 in videochat2

Evaluation results on MVBench different from the paper

← Metadata

Owner

Metadata

Ask-Anything Ask-Anything copied to clipboard

Metadata

← Metadata

Owner

Metadata

Ask-Anything
Ask-Anything copied to clipboard