stash
stash copied to clipboard
[Feature] Ability to report incorrect Scenes or Fingerprint from Stash to StashDB
Problem Statement: StashDB is greatly curated source of record and very well managed and continuously getting bigger and better, however there are many instances where users blindly submit incorrect scenes (that gets approved) and worse PHASH fingerprint that do not have any check at all. This is currently managed via a spreadsheet and its not sustainable.
Proposed Solution: Ability to report/submit incorrect scenes and fingerprints from Stash directly to StashDB, that could go through a review similar to new scene submission.
Alternatives: The manual spreadsheet is neither scalable or sustainable with StashDB growing so fast.
Additional context: Similar request to edit a scene in Stash-box is already submitted here
Definitely an absolute must as there are many many incorrect fingerprints and this needs addressing asap, especially as Stash grows bigger, the scrape functionality needs to be reliable.
Another proposal I would suggest is that something at StashDB end tracks where the admins can see the worst offenders for submitting incorrect fingerprints which can possibly lead to them being blocked from submitting them in future?
May seem overkill to some but wrong fingerprints are a PITA
Ok, so how do I phrase this. Individuals obtain there source materials from multiple different sources. Some of it has been slightly modified and changed from the original source materials.
This leads to several different hashes being attached to the same scene.
There are the P2P community curated sources. Scene curated sources (srrxxx). And official downloads direct from the content websites. Then there are uploads to mass consumption free sites that may have been modified. Which leads to different hashes.
I'm not arguing one method or another for obtaining materials.
A rough idea.
I think part of a "solution" to dealing with incorrect hash information is to partly integrate checks to help validate source material matches where you got it from. If multiple get source materials from a certain location, all the hashes should match - if they don't, you probably have a messed up corrupted file.
Say for example you have a source file, and you generate a srr file that matches with srrxxx - if stash were to run these checks - verify the file matched that database - then flipped a bit that tagged the matching hash as being "scene verified", submit this to StashDB, the more times this same "scene verified" tag get submitted the more likely it is to be "correct", so on StashDB it should get the "Scene Verified", tag. Then on import and tagging of materials, you can select to use specific source verified hashes.
A box can be checked for content gathered from "official" sites. In this case the source is claimed to be an official download from an official site. I'm not sure what kind of verification this could use to assure this is what the user says it is, beyond simple vote and crowdsourcing a vote on the tag.
I think part of the issue with so many hashes is due to there being multiple official downloads in multiple qualities and aspect ratios. And no way to see on StashDB which quality, or aspect ratio, etc these hashes are associated with.
So having all the hashes attach additional information, (4K, 1080p, 720p, scene, P2P) vote on it (by number of times it's been submitted), and integrate additional checks for sources may help.
Say for example I have a file that I know is from the scene, I know I have an SRR file for it, in the same directory as the extracted file. Stash should be able to detect that file, verify it matches the file it says it does. Verify it exists on srrxxx with an associated account. Then flag for everyone else via a vote on the hash that my hash for me was verified.
Torrent files, should have similar checks using their associated built in checksums, and even nzb files should leverage their internal checksums and flag in stashes database that it was verified.
It may drastically cut down on bad hashes as I suspect there's a source out there individuals are getting files from which screw with scene names etc to avoid being taken down. Leading to people getting files they think are certain scenes when they're something entirely different.
Seems overly complicated, and ignores the reality. There are many reasons to have different hashes. Sizes, encoding, reencoding, metadata changes. And that's all based in the original source. Add in the less than legal sources (scene release groups, tubesite, torrents, Usenet, etc etc) and the number of hashes increases and often the illicit copies may outweigh the "legit" hashes by sheer volume.
I would suggest steering clear of all of this. Either I get a metadata result that makes sense to me when I submit a hash query (phash preferred, far more matches, or oshash/md5hash, less preferred, less matches) OR I get a clearly wrong results, and I should be able to flag that as Incorrect. If more than a few people flag it, then it's a mistaken identify and should be avoided but not removed. (That also avoids poisoning of the hashes by those who seek to do so intentionally, whether thru submission of bad data, OR deletion of good data)
not planned