Video-Classification-2-Stream-CNN
Video-Classification-2-Stream-CNN copied to clipboard
Where is 2 stream itself?
Hi @wadhwasahil , @stillbreeze ,
As stated in the project's title, I suppose to see your model/network in 2 streams (2 inputs) but I only see spatial and temporal model in seperate network (i.e. you did not merge these models into 1 big model). Is it an unfinished project or is this your intention?
Thank you.
The repo is a re-implementation of the 2 stream CNN paper by Simonyan et al (https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf).
The paper trains the streams individually and does late fusion with the softmax through averaging or learning an SVM on normalized features. So there's no merging into a bigger model.
Ah, so far I have been wrong in understanding the term of "late fusion". So, in which part of the code did you fuse with softmax then average the result? Also, is it possible to jointly train the network for both spatial and temporal and then achieve the same result?
@sudonto yes you can train them jointly but due to lack of resources we had to train them separately.
Thank you for the answer. So, in which part of the code did you compute the average of softmax score to get the class accuracy? Sorry for asking too much.