DGAM-Weakly-Supervised-Action-Localization
DGAM-Weakly-Supervised-Action-Localization copied to clipboard
Real-time inference?
Hi, I read your paper and congrats for your work. Anyway, there is the inference part that is unclear to me: at inference time is it possible to use this framework for the online action detection task? (i.e. let's suppose I have an input stream video, is the model able to predict frame-level labels as the frames arrive, with real-time speed ?)
Thank you!
PS: by removing optical flow input features(it is a pre-processing step not suitable for real-time inference), can the framework achieve real-time inference?
Thank you!
Yes, the attention module takes the feature of each frame and output the single-frame attention. As for classification, the current pipeline takes the attention-weighted average of the features of all the frames as input, and output the video-level classification score. But it can also classify single-frame feature.
can anyone tell me how to do real-time inference?