SuperK
SuperK
@feiyun1265 And you should ues vggish to extract the audio feature and contact them together then send to the model
https://github.com/tensorflow/models/tree/master/research/audioset If you are using youtube-8m , you don't need it. the image feature is 1024 dimensions and audio feature is 128 dimensions , you should use them all.
@SharoneDayan I think may be you should try to freeze the model.
编译优化选项都打开了吗、
最小检测框设为80*80,编译优化选项全开
你用的是620model吗,编译优化开到最大了吗
那我想不到别的了,可能跟计算设备有关吧,我之前测的服务器,CPU3.2GHz的。
你测的图贴出来,我测一下

你测这张要多久?