HyperTag icon indicating copy to clipboard operation
HyperTag copied to clipboard

Add semantic video search

Open SeanPedersen opened this issue 3 years ago • 0 comments

First basic version: Partition video into e.g. 16 uniformly spaced (by time) sections and take a screenshot. Embed each screenshot and use average as video embedding.

Advanced: Partition video with higher granularity and extract frames e.g. every 5 seconds or fixed high number (+100). Compute embedding for every extracted frame. Compute pairwise consecutive frame distances in embedding space to infer semantically coherent video sections (similar frames). Embed each section as average of coherent frames (below a threshold). The list of average frame embeddings should be a pretty good representation of the video and comes with section start & end metadata.

SeanPedersen avatar Jan 08 '21 00:01 SeanPedersen