VideoGPT-plus issues

Zero-shot QA evaluation

How to perform zero-shot QA evaluation on datasets like MSVD-QA, MSRVTT-QA, TGIF-QA, ActivityNet-QA? Could we just follow the pipeline of Video-ChatGPT？

hulianyuyy

In what order should I reproduce the paper?

6

step1 pretrain_projector_image_encoder.sh step2 pretrain_projector_video_encoder.sh step3 finetune_dual_encoder.sh step4 eval/vcgbench/inference/run_ddp_inference.sh step5 eval/vcgbench/gpt_evaluation/vcgbench_evaluate.sh ``` #!/bin/sh export DATASET_DIR=/mnt2/ninghuayang/data/videogpt_plus_dataset BASE_LLM_PATH=microsoft/Phi-3-mini-4k-instruct VISION_TOWER=OpenGVLab/InternVideo2-Stage2_1B-224p-f4 IMAGE_VISION_TOWER=openai/clip-vit-large-patch14-336 PROJECTOR_TYPE=mlp2x_gelu #PRETRAIN_VIDEO_MLP_PATH=MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain/mlp2x_gelu_internvideo2/mm_projector.bin #PRETRAIN_IMAGE_MLP_PATH=MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain/mlp2x_gelu_clip_l14_336px/mm_projector.bin PRETRAIN_VIDEO_MLP_PATH=results/mlp2x_gelu_internvideo2/mm_projector.bin PRETRAIN_IMAGE_MLP_PATH=results/mlp2x_gelu_clip_l14_336px/mm_projector.bin OUTPUT_DIR_PATH=results/videogpt_plus_finetune deepspeed videogpt_plus/train/train.py \ --lora_enable True --lora_r 128...

rixejzvdl649

Inquiry about Costs Associated with Video LLM Benchmarks

Hello everyone, I have been working on replicating benchmarks related to video-class Large Language Models (LLMs), and I've noticed that most of these benchmarks rely on the GPT-assistant framework. Given...

hb-jw

Support for Multi-turn Conversations with Fixed Video Input?

Hello, I have a question regarding the conversation capabilities of this project: 1. Does the system support multi-turn conversations? 2. Is it possible to have a natural, ongoing dialogue while...

YoungjaeDev

Simple Demo

3

Hey! Thanks for your great work. Do u have any plan to provide a simple demo, i.e., input a video and a question, not a benchmark?

Zeqing-Wang

enhancement

Where can I find the dense captions for the 112K videos?

Thank you so much for sharing this amazing work! I’m wondering where I can find the dense captions for the 112k videos mentioned in the paper.

ronghangzhu

“python setup.py install” for flash-attention reports errors

1

Hello there, Thank you for your remarkable work and I am really interested in looking into it. The whole installation process works smoothly until the very last command. The “python...

Haodi-Liu

VideoGPT+ Inference code, Simple Demo on Google Colab

Wrote up code for a simple demo for VideoGPT+ inference on a sample video

Yogesh914

Phi3Model ImportError

2

when run the script, met the problem: ImportError: cannot import name 'Phi3Model' from 'transformers'

zimenglan-sysu-512

You are using a model of type phi3 to instantiate a model of type VideoGPT+. This is not supported for all configurations of models and can yield errors.

Hi, I am getting this error while train the model - You are using a model of type phi3 to instantiate a model of type VideoGPT+. This is not supported...

dpramanik2289

VideoGPT-plus
VideoGPT-plus copied to clipboard

Metadata

Zero-shot QA evaluation

In what order should I reproduce the paper?

Inquiry about Costs Associated with Video LLM Benchmarks

Support for Multi-turn Conversations with Fixed Video Input?

Simple Demo

Where can I find the dense captions for the 112K videos?

“python setup.py install” for flash-attention reports errors

VideoGPT+ Inference code, Simple Demo on Google Colab

Phi3Model ImportError

You are using a model of type phi3 to instantiate a model of type VideoGPT+. This is not supported for all configurations of models and can yield errors.

← Metadata

Owner

Metadata

VideoGPT-plus VideoGPT-plus copied to clipboard

Metadata

← Metadata

Owner

Metadata

VideoGPT-plus
VideoGPT-plus copied to clipboard