Classifying videos
It seems useful (and not too difficult) to tag videos by extracting some frames and running the normal networks on them. One option would be something like this: https://jdhao.github.io/2021/12/25/ffmpeg-extract-key-frame-video/
I might work on a PR if there's sufficient interest.
That would be great to be able to recognize videos too, and that approach shouldn't be too hard to implement & pretty quick to execute, +1 for me! ;)
Although we have to agree on some specifications, mainly how to chose keyframes. Something like that would be the best, but I believe there is no available free model of this kind (yet).
Meanwhile we could go on some arbitrary timerange (and maybe adapt it depending on the full length of the video), and exclude too similar / too dark images. For all that we could quite easily adapt this from MediaDC plugin: https://github.com/andrey18106/mediadc/blob/main/lib/Service/python/dc_videos.py
@marcelklehr What do you think?
Good idea, indeed. My first intuition would be to classify a frame every X seconds and accumulate all the resulting labels. We already have ffmpeg installed.
In addition to classifying stills from the video, there's also actual video recognition models: e.g. Google's MoViNet with a set of quite interesting labels
That MoVinet model looks pretty great (love those graphs recognition/time), and I'm surprised how low memory usage seems to be. Although it's made for actions only right? I guess we'll need both then :)
Reopening to track face and imagenet recognition for videos