videoduplicatefinder icon indicating copy to clipboard operation
videoduplicatefinder copied to clipboard

Configurable method for generating thumbnails

Open piotrminkina opened this issue 1 year ago • 2 comments

Environment Linux

Describe the solution you'd like In my large video library, with a lot of duplicates, I have found a fair number of duplicate videos of inferior quality thanks to VDF. Thank you for this tool! However, there are sometimes cases when a video, despite a lot of similarity (in my opinion), your tool detects too low similarity. This applies to videos that have a duration that is almost identical but differ in quality. I don't know where these 1-second time differences come from, but I do know that it is the effect of Google Photos when a copy with reduced quality is enabled.

Settings Scan result
image image
image image

As you can see from the screenshots above, the thumbnails of similar videos differ. I suppose the differences are due to the method of generating the thumbnails from the video, which either does this not very accurately or depends to some extent on the length of the video. As an example, I opened all the files in VLC and then using a Short forward jump function set the video to 0:00, 0:10 and 0:30 and then took a screenshots I've included below. As you can see, the thumbnails of the videos are no different, at first glance. I didn't take screenshots for the last frame, as I found it very difficult to get it in VLC.

Perhaps a solution to my problem would be to implement an additional function in VDF to change the method of generating thumbnails from the videos? E.g. instead of selecting the number of thumbnails, then selecting a time every how many seconds to make a thumbnail from the video? In this situation, the number of thumbnails would depend on the length of the video in question. Possibly with an additional limiter on the maximum number of thumbnails generated.

Filename 0:00 0:10 0:30
20151031_142100.mp4 Screenshot from 2023-01-02 21-28-19 Screenshot from 2023-01-02 21-28-34 Screenshot from 2023-01-02 21-28-50
20151031_142135.mp4 Screenshot from 2023-01-02 21-29-10 Screenshot from 2023-01-02 21-29-18 Screenshot from 2023-01-02 21-29-28
20151216_164645.mp4 Screenshot from 2023-01-02 21-29-42 Screenshot from 2023-01-02 21-29-47 Screenshot from 2023-01-02 21-30-00
20151216_164645_1.mp4 Screenshot from 2023-01-02 21-30-09 Screenshot from 2023-01-02 21-30-13 Screenshot from 2023-01-02 21-30-17
20151222_092322.mp4 Screenshot from 2023-01-02 21-30-28 Screenshot from 2023-01-02 21-30-31 Screenshot from 2023-01-02 21-30-36
20151222_092322_1.mp4 Screenshot from 2023-01-02 21-30-48 Screenshot from 2023-01-02 21-30-53 Screenshot from 2023-01-02 21-30-57
20151227_104940.mp4 Screenshot from 2023-01-02 21-31-12 Screenshot from 2023-01-02 21-31-15 Screenshot from 2023-01-02 21-31-20
20151227_114809.mp4 Screenshot from 2023-01-02 21-31-29 Screenshot from 2023-01-02 21-31-32 Screenshot from 2023-01-02 21-31-35

Regards Piotr Minkina

piotrminkina avatar Jan 02 '23 21:01 piotrminkina

It depends on the length of the video file. If you use VLC and choose the go to position option, VLC may either skip to the exact frame or to the nearest frame. VDF uses ffmpeg to generate thumbnails, ffmpeg may also either skip to the exact frame or to the nearest frame. There are cases where the frame of VLC and frame of ffmpeg are different.

E.g. instead of selecting the number of thumbnails, then selecting a time every how many seconds to make a thumbnail from the video?

This wouldn't work on video files which length are smaller than the time you set. The way it was done in VDF is because a) It works on ALL video files, length doesn't matter and b) It allows you to rescan against previously scanned videos without VDF has to re-create the graybyte information. Unfortunately it comes at the cost of VDF is not able to detect duplicates where time is shifted.

0x90d avatar Jan 09 '23 08:01 0x90d

I understand how it works now. I do not propose an alternative method of generating thumbnails, but an additional, for specific cases, selectable method from settings. Just as changing the number of thumbnails requires scanning the photo collection, changing the thumbnail generation method would require such a scan.

I also propose that the algorithm optionally adds a thumbnail for the first x configured seconds of the video and the x last seconds. In this situation, if, for example, I choose to generate thumbnails every 30 seconds, and the video is, for example, 25 seconds long, it will still produce two thumbnails for comparison.

From what I noticed from myself, in order to completely prune duplicates from my collection, I had to repeatedly change the settings to detect more duplicates anyway. If VDF had found all the duplicates in my collection in one go, I would have grabbed my head at how many videos I had to review in order to decide what was actually a duplicate. I often found false-positives.

piotrminkina avatar Jan 09 '23 19:01 piotrminkina