Extend MiniGPT-4 for video level

Open pixeli99 opened this issue 2 years ago • 1 comments

Hi! We have simply extended MiniGPT-4 for video level in our project DriveScenify.

DSify is a tailored version of MiniGPT-4 that focuses on understanding and generating responses based on driving scene videos. It aligns a frozen visual encoder from InternVideo with a frozen LLM, Vicuna, using the PerceiverResampler from OpenFlamingo, specifically for driving scenarios (But it also have some understanding ability for general videos😎).

At present, it is only an initial version, limited by computational power and other limitations, and the data used for training is limited. However, there is already a prototype, and everyone is welcome to try it out!

May 05 '23 07:05 pixeli99

good job!

May 06 '23 03:05 feymanwang