LLaVA
LLaVA copied to clipboard
[Question] Predictions in video by stitching consecutive frames into a single image
Question
I am looking to use LLaVa for predictions in video by stitching a sequence of consecutive frames into a single image and then asking LLava for a prediction. Has anyone used this approach before and found any success? if so, any tips on how you approached it.
Hi, now I also need to predict videos. Do you have a better solution? My current approach is to draw frames to predict