3D-CNN-Gesture-recognition
3D-CNN-Gesture-recognition copied to clipboard
Extraction of frames from the video.
Hello Anas, I am doing similar project like this but in that project i want to recognize the continous signs of sign language. I found your project helpful and i am just a begginer in Deep-learning. I want to know that i have data of sign language in form of simple videos of gesture performed by volunteers which is similar to 20BN-jester data-set. I want to know how you are extracting the frames from the videos, and one more thing in my case if there is a video of a whole sentance that include different gestures of signs( below link is to similar sentance video, you can have good idea what i want to say) then how we can extract the frames for the all these signs and identify them as individual sign. I will be very thankfull for your help.
https://www.psl.org.pk/signs/en/sentences/5c9634bb0625be0004d9217b
Hello @AsadZahed,
I think that you can extract video frames using ffmpeg, e.g.
ffmpeg -i video.webm -vf fps=12 -qscale:v 2 %5d.jpg
This will extract images from video, at 12 images per second using 0000x.jpg format (similar to what 20bn-jester is using)
Hi, how did you create your own data_csv from video
hello @AsadZahed , To get frames from video, check the main-beginners-syntax.py file, block (# In[14]). `
hm_frames = 30 # number of frames
# unify number of frames for each training
def get_unify_frames(path):
offset = 0
# pick frames
frames = os.listdir(path)
frames_count = len(frames)
# unify number of frames
if hm_frames > frames_count:
# duplicate last frame if video is shorter than necessary
frames += [frames[-1]] * (hm_frames - frames_count)
elif hm_frames < frames_count:
# If there are more frames, then sample starting offset
#diff = (frames_count - hm_frames)
#offset = diff-1
frames = frames[0:hm_frames]
return frames
`
If you have several gest in a training video you have to cut it manually or by a python script..., because for the part of training the model cannot yet detect the gestures, after the training of the model he will be able to detect the different jest in a only video.