stash-box
stash-box copied to clipboard
Add phash duration cutoff option
Phashes for short clips can cause issues because they generally don't contain enough details to uniquely identify a clip, which causes them to clump together along with associated md5/oshash fingerprints. An example is https://stashdb.org/scenes/e46fa5b8-7c3a-4db6-8522-17abeb26bfbb#fingerprints
I'm not sure what the best cutoff is, but at least 1-2mins I think, depending on content.
Not sure what caused the mess in the example, but on FansDB side we have been dealing with very short durations (as short 1-2 seconds) and we haven't experienced anything like this (scale is close to 30k scenes).
An alternative to this would be modifying the hash algorithm to better deal with lack of unique details, like proposed in https://github.com/stashapp/stash/pull/4074.