xeroqin
xeroqin
Hi InternVideo2 team! Could you please share a code about how you extract the multi-modal features? I'd like to use the models to extract feature of my own dataset. Thanks...
在使用start.py运行项目时可以正常上传视频并生成字幕,但是在使用API时总是无法上传视频,并报错“Expectiec value: line 1 column 1 (char 0)” 
大佬您好,将数据集更换为自定义数据集后,训练时会出现如下内容: [train] Unique sentence is 3483 , all num is 3486 Video number: 3486 Total Pairs: 3486 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x114200e40] moov atom not found data/biology/video_split/2_8_12_(P12. 鸟(2))_6.mp4 data/biology/video_split/2_8_12_(P12. 鸟(2))_6.mp4 data/biology/video_split/2_8_12_(P12. 鸟(2))_6.mp4...