VILA icon indicating copy to clipboard operation
VILA copied to clipboard

Long context video module only

Open MH-Python opened this issue 1 year ago • 0 comments

Great works and research.

My question is simply if is it possible to use only the visual/video part (already pretrained on video dataset like kinetics) for fine-tuning on long video dataset e.g. to classify 1-minute or 2-minutes of video data.

MH-Python avatar Oct 02 '24 14:10 MH-Python