two-stream-action-recognition
two-stream-action-recognition copied to clipboard
real time prediction using webcam
hi, may i know whether this program can be modified to run in real time using webcam? real-time extracting frames and features and make prediction.
@hanako94 hello, did you find a solution for that ? could you even test it with your own videos ?
https://github.com/harvitronix/five-video-classification-methods you may try with this. i tried with my own data, it is able to predict. but the accuracy it not so satisfy.
Dear Hanako94, Harvitronix's 5-video-classify-methods says passing a video and getting a predicted class is not implemented yet. Is it true? Or it can already do demos, and the README is not updated accordingly? Or you modified it to do demos (by demo I mean passing a video and get a class prediction)? BR, JimmyYS
yup, i modified myself to accept mobile camera input (every 30 frames) and predict the outcome on phone
Dear Hanako, thx for your help! do you use this pre trained model:
saved_model = 'data/checkpoints/lstm-features.026-0.239.hdf5' ? as specified in :
five-video-classification-methods\demo.py
or you use other pre-trained model?
Can you share how you did the demo? Some guide? Hint? Or maybe commit patch?
BR, JimmyYS
i'm not using the model mentioned, i trained my own model, by using my own dataset. But my scope is very small, just to recognize 4 classes only. to modify the demo to read video input or web camera frame input and do real time prediction, you might need to spend some time to study the code written by Harvitronix and modify from there, especially on how he extracts frames from video, convert it and pass it for inference. i think for lstm and mlp is hard to make real time prediction because it takes time to extract feature from frames before do the prediction. lrcn will be faster, cause it use only frames for prediction (drawback is that the accuracy is not so satisfied)
Dear Hanako, Thanks for your generous advice, So far, I have read Andrej Karpathy's large scale video classification with conv neural network. There are at least two more paper I plan to study. I think Andrej's paper mentioned several interesting ideas, but, unfortunately, their source code is not open source.
I find some repo on github, but till now, I haven't found any pre-trained model yet: https://github.com/wangheda/youtube-8m https://github.com/jeffreyhuang1/two-stream-action-recognition https://github.com/harvitronix/five-video-classification-methods
BR, JimmyYS
hey @hanako94 Method from this repo requires optical flow as pre-processing data, to target at higher accuracy. If your classification category are limited, speed bottleneck is at OF calcuation in real time. More recent works have been done to eliminated OF calculation from pipelines to speed up and achieve end-to-end training, without sacrifice much of accuracy. For example Hidden two streams model is worth checking out.
Cheers!