Deep-MIML-Network icon indicating copy to clipboard operation
Deep-MIML-Network copied to clipboard

how to get the video level "weak" label

Open xiaoyiming opened this issue 7 years ago • 5 comments

Dear Mr. Gao Thank you so much for the great work. However, I met some problems when I implemented this code. As described in you article, "For the visual frames, we use an ImageNet pre-trained ResNet-152 network [34] to make object category predictions, and we max-pool over predictions of all frames to obtain a video-level prediction. The top labels (with class probability larger than a threshold = 0.3) are used as weak \labels" for the unlabeled video." However, when I use the pre-trained-152 network, I can get the only one category prediction lager than the threshold. How can I get multi-labels through the pre-trained-152 network. Should I train a object detection network or a multi-classes multi-labels network or some other solutions. Thank you for your assistance Best regards!

xiaoyiming avatar Nov 13 '18 04:11 xiaoyiming

Hi,

We didn't use all 1000 imagenet classes, but ~20 selected audio-related classes. Then we normalize the class probabilities for these classes, so you could get multiple labels with class probability larger than the threshold. Also, 0.3 is just empirical.

Thanks for your interest!

rhgao avatar Nov 13 '18 13:11 rhgao

@rhgao Thanks for your reply! I will try it

xiaoyiming avatar Nov 13 '18 13:11 xiaoyiming

Dear Mr. Gao Thank you so much for the great work. However, I met some problems when I implemented this code. As described in you paper, "we collect a maximum of 3,000 basis vectors for each object category." " In other words, we concatenate the basis vectors learnt for each detected object to construct the basis dictionary W(q). Next, in the NMF algorithm, we hold W(q) fixed, and only estimate activation H(q) with multiplicative update rules. However, what's the shape of the selected W(q)(j) ? It is also MXK (K=25)? And how do you selected K basis vectors from the 3000 stored basis vectors

xiaoyiming avatar Dec 01 '18 12:12 xiaoyiming

Hi, We use all the collected basis vectors to initialize W, namely M x K with M = 3000, K=25. 3,000 is just a hyperparameter, and a larger number of basis vectors could potentially lead to better results.

rhgao avatar Dec 03 '18 15:12 rhgao

Thanks, cloud you please give me your train loss/mAp ,and val loss/mAp. my train loss is about 0.0001, train Map is about 0.72. My val loss is about 0.1 and val mAp is 0.65 after 300 iter, batchSize and Valsize is the same of you. Is that normal?

xiaoyiming avatar Dec 08 '18 09:12 xiaoyiming