ViS4mer icon indicating copy to clipboard operation
ViS4mer copied to clipboard

Pre-trained model weights

Open nahidalam opened this issue 2 years ago • 4 comments

Hi

Will you publish any pre-trained model? Preferably in torchHub? I was thinking of using ViS4mer for extracting image embedding.

nahidalam avatar Aug 02 '22 19:08 nahidalam

Hi @nahidalam, Thanks for your comment. I will try to publish some pre-trained model weights. Do you want the pretrained model of any particular dataset? For the LVU dataset, there are 9 tasks, which tasks are you interested in?

md-mohaiminul avatar Aug 03 '22 19:08 md-mohaiminul

I was thinking ImageNet or any other image dataset since my goal is to get image embedding. But then I realized ViS4mer is for understanding video so I am not sure if my request makes sense.

Nonetheless wrt LVU dataset - scene/place, relationship, way of speaking - these three tasks are most relevant for me.

nahidalam avatar Aug 03 '22 22:08 nahidalam

Yes, ViS4mer is a video understanding model. However, technically you can use it for image modeling too. Anyway, I will try to release the pretrained weights for the scene/place, relationship, way of speaking tasks.

md-mohaiminul avatar Aug 03 '22 22:08 md-mohaiminul

Thank you for quick replay. Yes I understand it is technically possible to get embedding. I was thinking more from "meaning" perspective for my particular usecase. Maybe a model on COIN dataset makes better sense. I think a recipe on how to train ViS4mer for custom video dataset will be good so people like me can try on their own instead of taking your time. Happy to collaborate if you plan to do that sometime.

nahidalam avatar Aug 03 '22 23:08 nahidalam