VILA Long context video module only

Long context video module only

Open MH-Python opened this issue 1 year ago • 0 comments

Great works and research.

My question is simply if is it possible to use only the visual/video part (already pretrained on video dataset like kinetics) for fine-tuning on long video dataset e.g. to classify 1-minute or 2-minutes of video data.

Oct 02 '24 14:10 MH-Python

VILA VILA copied to clipboard

Long context video module only

VILA
VILA copied to clipboard