Video-Classification-2-Stream-CNN icon indicating copy to clipboard operation
Video-Classification-2-Stream-CNN copied to clipboard

Where is 2 stream itself?

Open sudonto opened this issue 6 years ago • 4 comments

Hi @wadhwasahil , @stillbreeze ,

As stated in the project's title, I suppose to see your model/network in 2 streams (2 inputs) but I only see spatial and temporal model in seperate network (i.e. you did not merge these models into 1 big model). Is it an unfinished project or is this your intention?

Thank you.

sudonto avatar Sep 15 '18 15:09 sudonto

The repo is a re-implementation of the 2 stream CNN paper by Simonyan et al (https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf).

The paper trains the streams individually and does late fusion with the softmax through averaging or learning an SVM on normalized features. So there's no merging into a bigger model.

stillbreeze avatar Sep 15 '18 18:09 stillbreeze

Ah, so far I have been wrong in understanding the term of "late fusion". So, in which part of the code did you fuse with softmax then average the result? Also, is it possible to jointly train the network for both spatial and temporal and then achieve the same result?

sudonto avatar Sep 16 '18 08:09 sudonto

@sudonto yes you can train them jointly but due to lack of resources we had to train them separately.

wadhwasahil avatar Sep 17 '18 07:09 wadhwasahil

Thank you for the answer. So, in which part of the code did you compute the average of softmax score to get the class accuracy? Sorry for asking too much.

sudonto avatar Sep 17 '18 07:09 sudonto