XPretrain icon indicating copy to clipboard operation
XPretrain copied to clipboard

Asking for a simple script to get text and video features

Open yotammarton opened this issue 2 years ago • 8 comments

First of all - Amazing work on this one.

I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:

model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)

[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script.

Thank you :)

yotammarton avatar Jun 19 '23 21:06 yotammarton

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

jingli18 avatar Jun 21 '23 11:06 jingli18

First of all - Amazing work on this one.

I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:

model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)

[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script.

Thank you :)

Hi, we are intergrating CLIP-ViP into Huggingface transformers. I believe it will be more easily called. Please keep an eye on it.

HellwayXue avatar Jul 03 '23 09:07 HellwayXue

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

Hi, for ASR texts, please refer to #7 . For auxiliary captions, please download from this link: Azure Blob Link

HellwayXue avatar Jul 03 '23 09:07 HellwayXue

Thanks a lot!

On Mon, Jul 3, 2023 at 5:24 PM HellwayXue @.***> wrote:

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

Hi, for ASR texts, please refer to #7 https://github.com/microsoft/XPretrain/issues/7 . For auxiliary captions, please download from this link: Azure Blob Link https://hdvila.blob.core.windows.net/dataset/hdvila_ofa_captions_db.zip?sp=r&st=2023-03-16T04:58:26Z&se=2026-03-01T12:58:26Z&spr=https&sv=2021-12-02&sr=b&sig=EYE%2Bj11VWfQ6G5dZ8CKlOOpL3ckmmNqpAtUgBy3OGDM%3D

— Reply to this email directly, view it on GitHub https://github.com/microsoft/XPretrain/issues/24#issuecomment-1617709079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIWUVABMN6IXKTE5DBSOT3XOKFWFANCNFSM6AAAAAAZMNCGAM . You are receiving this because you commented.Message ID: @.***>

jingli18 avatar Jul 03 '23 11:07 jingli18

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

Hi, for ASR texts, please refer to #7 . For auxiliary captions, please download from this link: Azure Blob Link

@HellwayXue Thanks for providing the auxiliary captions. But how to open the data.mdb files ? I tried Access and VisualStudio but they did not work...

Spark001 avatar Aug 10 '23 08:08 Spark001

First of all - Amazing work on this one. I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:

model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)

[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script. Thank you :)

Hi, we are intergrating CLIP-ViP into Huggingface transformers. I believe it will be more easily called. Please keep an eye on it.

Hi @HellwayXue, any update on integration with HuggingFace? Thank you:)

MVPavan avatar Sep 27 '23 10:09 MVPavan

@MVPavan @yotammarton i'v created a simple example here: https://github.com/eisneim/clip-vip_video_search

eisneim avatar Nov 16 '23 02:11 eisneim

Hi @MVPavan can you please suggest what configuration of GPUs are required to run this model. ( just for making inference on it )

someshfengde avatar Jan 16 '24 13:01 someshfengde