MiniGPT-4
MiniGPT-4 copied to clipboard
Extend MiniGPT-4 for video level
Hi! We have simply extended MiniGPT-4 for video level in our project DriveScenify.
DSify is a tailored version of MiniGPT-4 that focuses on understanding and generating responses based on driving scene videos. It aligns a frozen visual encoder from InternVideo with a frozen LLM, Vicuna, using the PerceiverResampler from OpenFlamingo, specifically for driving scenarios (But it also have some understanding ability for general videos😎).

At present, it is only an initial version, limited by computational power and other limitations, and the data used for training is limited. However, there is already a prototype, and everyone is welcome to try it out!
good job!