stash
stash copied to clipboard
Migrate scene covers to filesystem
Adds a data
directory. This directory is intended to store user data that we don't want to store in the database. It defaults to the data
subdirectory in the directory where config.yml
is.
Scene cover generation during scanning now writes to files in the data
directory, and setting the cover via scene update or identify will also write to this location.
Data file storage works as follows:
- covers for ids < 1000 are stored as
data/scenes/<id>_cover.jpg
- covers for 1000 <= ids < 1000000 are stored under subdirectory
data/1000
, then the id is broken up into three-digit chunks like for thumbnails. eg id 123456 becomesdata/1000/123/123456_cover.jpg
. - covers for ids >= 1000000 are stored under subdirectory
data/1000000
, then the id is broken up like before: eg id 123456789 becomes `data/1000000/123/456/123456789_cover.jpg - ids over 1000000000 are supported the same way, but we're unlikely to need it
Added a default scene image for when a cover is not found:
The migration process involves user intervention. A schema migration renames to scenes_cover
table to indicate that it is deprecated, but the process to move existing cover data is not done automatically.
Added two migrations to the tasks page:
The first moves existing covers in the generated/screenshots
to the data
directory, then overwrites those with any covers found in the scenes_cover
table.
The second drops the scenes_cover
table and runs a vacuum to reclaim disk space.
As part of the refactoring for this PR, file.Deleter
is removed in favour of fsutil.FSTransaction
, which supports deleting, adding and modifying files, with rollback support.
Related to #2271 Resolves #2303
This PR should also hopefully fix a latent bug that keeps coming up around the file naming hash setting. It will now default filenaming hash to oshash
and not calculate MD5s if these settings aren't present. The post-migration check for the original oshash introduction will now only be run if the system was migrated from a system prior to schema version 12. This check remains largely the same as it was, but it overwrites the config values rather than only setting the default.
Data file storage works as follows:
covers for ids < 1000 are stored as data/scenes/
_cover.jpg covers for 1000 <= ids < 1000000 are stored under subdirectory data/1000, then the id is broken up into three-digit chunks like for thumbnails. eg id 123456 becomes data/1000/123/123456_cover.jpg. covers for ids >= 1000000 are stored under subdirectory data/1000000, then the id is broken up like before: eg id 123456789 becomes `data/1000000/123/456/123456789_cover.jpg ids over 1000000000 are supported the same way, but we're unlikely to need it
Was there a specific reason to separate the <1000 , 1000<1000000 and not use the same schema for all numbers? With minor changes we can use the same schema used for the thumbnails (with depth=2 and length=2/3) All that needs to change is zero pad the scene id if needed and read from right to left instead left to right Sample code here
Was there a specific reason to separate the <1000 , 1000<1000000 and not use the same schema for all numbers? With minor changes we can use the same schema used for the thumbnails (with depth=2 and length=2/3) All that needs to change is zero pad the scene id if needed and read from right to left instead left to right Sample code here
I wanted to limit the number of files in a directory. With a separate schema, we end up with a maximum of 1001 directory entries per directory - [x..]000
- [x..]999
files, plus 1000[0..]
directory.
The problem with your suggestion is that I believe it is unintuitive to read from right to left when looking for a given id. I'd prefer to use left to right and be consistent with the other intra-dir algorithm.
I have thought about this further and it's probably not that bad to just combine the schemas. That is, drop the 1000[0..]
folders and put everything in the root directory.
So to make it work as follows:
- covers for ids < 1000 are stored as
data/scenes/<id>_cover.jpg
- covers for ids 1000 - 1999 are stored in
data/scenes/001/<id>_cover.jpg
- covers for ids 2000 - 2999 are stored in
data/scenes/002/<id>_cover.jpg
, and so on - covers for ids 998000 - 999999 are stored in
data/scenes/999/<id>_cover.jpg
- covers for ids 1000000 - 1000999 are stored in
data/scenes/001/000/<id>_cover.jpg
- covers for ids 1001000 - 1001999 are stored in
data/scenes/001/001/<id>_cover.jpg
and so on.
This does mean that a given subdirectory will have up to 2000 file entries.
For example, subdirectory data/scenes
will have the following entries:
-
[000-999]_cover.jpg
- 1000 files -
[001-999]
- 999 subdirectories
data/scenes/001
would have:
-
[1000-1999]_cover.jpg
- 1000 files -
[000-999]
- 1000 subdirectories
I think this is intuitive enough, and I think a maximum of 2000 file entries is fairly reasonable.
The last iteration looks good to me (i wouldnt expect us to need the 1000000 - ... case anyway).
I would prefer instead of covers for ids < 1000 are stored as data/scenes/<id>_cover.jpg
-> covers for ids < 1000 are stored as data/scenes/000/<id>_cover.jpg
but that's just me.
From a quick first look found another issue. When doing a full import ( import with reset to true ) the database is reset while the data dir isnt. Since the full import will recreate the database from scratch with new ids then the data
dir must be nuked
.
I might be a coder/programmer myself and because of that I'm really interested to know how is this a good change? the Migrate scene covers to filesystem
.
What are the pros of Migrating scene covers to filesystem
instead of just saving the in the database?
One that i can think of is smaller stash-go.sqlite
file size or the ability to set a different drive/disk path for data folders.
What are the pros of
Migrating scene covers to filesystem
instead of just saving the in the database?
It is likly because of issue #2271; so keeping the sqlite database size in check. Me personally prefers to have everything, that can't get recreated, in the database. Its easier to understand, to backup or migrate to a new location.
I am also not sure how much the database size is a problem for the users. SQLite can handle databases gigabytes in size, esp. in WAL mode. But thats on desktop / server, on smaller and / older devices that may be another story.
Overall, i would like to see a mixed approach in the future:
scene_cover_custom (blob) => custom image by the user => stored in the database
scene_cover_image_id (int) => cover image by selecting a image in stash
scene_cover_video_ts (float) => cover image by timestamp of the video, created cover.jpg stored in generated
Replaced with #3187