stash
stash copied to clipboard
[Feature] Tag files that fail during the generation task so they can be avoided in future runs.
Is your feature request related to a problem? Please describe.
I'm currently attempting to import about 27tb of files collected over the period of almost 30 odd years. Not all of the video files that I have archived are 'intact' in that current day ffmpeg can handle them in a way that doesn't cause an error when generating the supporting files, and as such remain in the generate 'queue' whenever I am forced to restart the task (i.e. when I need to use the computer doing the work for something other than Stashapp, I generally stop the task to get back my CPU cycles and restart it when I'm done). While this has worked in general, as I'm reaching the 20tb 'hashed' mark the number of files that have to be re-re-re-examined just to determine that, yep still can't read them properly, has reached the point where it takes an hour or two before 'new' files are being looked at.
Describe the solution you'd like
It would be really nice if there was an option to tag scenes/files when ffmpeg errors out with the type of generation that was attempted and failed (e.g. [phash generation failed],[vtt sprite generation failed] and etc.). This would allow me to narrow down on which files were actually causing problems and determine if there was a way to 'fix' them to stop the errors.
Additionally it would be extra nice if the batch generation task could setup to be selective so that such tagged files could be skipped during the task, not wasting further time on the files till they've been reviewed and 'cleared'.
Describe alternatives you've considered
If the batch generation task couldn't be hinged on running against tasks, then setting it up to be folder based so that a combination of plugins and tagging to move the 'broken files' to a specific folder to be avoided could also work.
I'm currently looking at implementing this.
It would be wise to do it with all generations types since multiple generation types can fail like Previews, Hashes, Sprites, etc.
I'd create a new table in the database with 2 columns.
Failed_Generations
Type, Hash
The required()
of each specific generator would return false
if generation has failed for this specific type and hash.
There would also be a mean to retry failed generations since it could work later one, like with a new version of ffmpeg.
Table would be cleaned when file changes hash or during cleanup task. Haven't got there yet.
The problem of bad files re-generating every time is an issue I'm aware of and encounter myself. I have yet to settle on an appropriate design for handling them.
I'd create a new table in the database with 2 columns.
Failed_Generations Type, Hash
The
required()
of each specific generator would returnfalse
if generation has failed for this specific type and hash. There would also be a mean to retry failed generations since it could work later one, like with a new version of ffmpeg. Table would be cleaned when file changes hash or during cleanup task. Haven't got there yet.
I'm not really keen on the design mentioned above - I suspect that this issue speaks of a larger issue with the way we generate the auxiliary files, and a more holistic approach will be needed. Introducing this change will be a band-aid approach and likely make it more difficult to develop a cleaner solution.
A shorter-term iterative solution would be to introduce a plugin hook trigger when generation fails. This would allow creation of a plugin to accept the failed file details and (for example) tag the scene/image/gallery with a configurable tag. A plugin might also generate a dummy file to be used in place of the actual generator output so that the generate task doesn't try to re-generate from bad source files.
Brainstorming here:
I'm not really keen on the design mentioned above - I suspect that this issue speaks of a larger issue with the way we generate the auxiliary files, and a more holistic approach will be needed
That was a thought I had but did not want to propose an invasive modification.
Furthermore, there's a benefit in the simplicity of how it's currently done since it directly refers to a file by hash
. There is no database table in-between the code and the filesystem, which could get out of sync and require a recurrent maintenance task.
With this in mind, having a table to list faulty hashes as mentioned above seemed a sound approach to me.
It would allow regeneration without effort.
A shorter-term iterative solution would be to introduce a plugin hook trigger when generation fails. This would allow creation of a plugin to accept the failed file details and (for example) tag the scene/image/gallery with a configurable tag
This solution could work but the failure would not be automatically repaired if the file is modified/replaced.
It would however have the benefits of being easily searchable since it's in tags.
A caveat: tags are easily replaced when identifying/scraping.
A plugin might also generate a dummy file.
I did not want to go down that road since there would be no easy way to retry the failed items. Furthermore having a dummy file instead of a "not found" did not appeal to me as a clean solution. A generation failure might work fine later because:
- The file was replaced/modified because the download was partial and finished or if the user replaces the file on purpose.
- Or a new ffmpeg version or generation flags works later on
What do you think?
How about the following:
- if a generation task fails, then a zero-length file will be generated instead
- functions that consume generated files will need to check for zero-length files and treat it as a file not found
- the presence of the zero-length files will prevent re-generation from occurring by default
- if a file is replaced, then it will be regenerated since the hash will have changed - this is existing behaviour
- in the event that the user wants to retry generating files, they can use the overwrite option, manually remove zero-length files, and/or we could add a generate option to retry generation, which would detect zero-length files and rerun generation
I still think a hook to trigger on generation failure is worth doing but it's not essential for this.
Seems like an interesting solution. I don't have much time right now. I'll take a look in the future if I get some time!
Related to #710 There is a plugin that does achieve this https://github.com/stg-annon/StashScripts/tree/main/plugins/findFileErrors