stash icon indicating copy to clipboard operation
stash copied to clipboard

Migrate scene covers to filesystem

Open WithoutPants opened this issue 2 years ago • 6 comments

Adds a data directory. This directory is intended to store user data that we don't want to store in the database. It defaults to the data subdirectory in the directory where config.yml is.

Scene cover generation during scanning now writes to files in the data directory, and setting the cover via scene update or identify will also write to this location.

Data file storage works as follows:

  • covers for ids < 1000 are stored as data/scenes/<id>_cover.jpg
  • covers for 1000 <= ids < 1000000 are stored under subdirectory data/1000, then the id is broken up into three-digit chunks like for thumbnails. eg id 123456 becomes data/1000/123/123456_cover.jpg.
  • covers for ids >= 1000000 are stored under subdirectory data/1000000, then the id is broken up like before: eg id 123456789 becomes `data/1000000/123/456/123456789_cover.jpg
  • ids over 1000000000 are supported the same way, but we're unlikely to need it

Added a default scene image for when a cover is not found: image

The migration process involves user intervention. A schema migration renames to scenes_cover table to indicate that it is deprecated, but the process to move existing cover data is not done automatically.

Added two migrations to the tasks page: image

The first moves existing covers in the generated/screenshots to the data directory, then overwrites those with any covers found in the scenes_cover table.

The second drops the scenes_cover table and runs a vacuum to reclaim disk space.

As part of the refactoring for this PR, file.Deleter is removed in favour of fsutil.FSTransaction, which supports deleting, adding and modifying files, with rollback support.

Related to #2271 Resolves #2303

WithoutPants avatar Apr 07 '22 02:04 WithoutPants

This PR should also hopefully fix a latent bug that keeps coming up around the file naming hash setting. It will now default filenaming hash to oshash and not calculate MD5s if these settings aren't present. The post-migration check for the original oshash introduction will now only be run if the system was migrated from a system prior to schema version 12. This check remains largely the same as it was, but it overwrites the config values rather than only setting the default.

WithoutPants avatar Apr 07 '22 02:04 WithoutPants

Data file storage works as follows:

covers for ids < 1000 are stored as data/scenes/_cover.jpg covers for 1000 <= ids < 1000000 are stored under subdirectory data/1000, then the id is broken up into three-digit chunks like for thumbnails. eg id 123456 becomes data/1000/123/123456_cover.jpg. covers for ids >= 1000000 are stored under subdirectory data/1000000, then the id is broken up like before: eg id 123456789 becomes `data/1000000/123/456/123456789_cover.jpg ids over 1000000000 are supported the same way, but we're unlikely to need it

Was there a specific reason to separate the <1000 , 1000<1000000 and not use the same schema for all numbers? With minor changes we can use the same schema used for the thumbnails (with depth=2 and length=2/3) All that needs to change is zero pad the scene id if needed and read from right to left instead left to right Sample code here

bnkai avatar Apr 07 '22 18:04 bnkai

Was there a specific reason to separate the <1000 , 1000<1000000 and not use the same schema for all numbers? With minor changes we can use the same schema used for the thumbnails (with depth=2 and length=2/3) All that needs to change is zero pad the scene id if needed and read from right to left instead left to right Sample code here

I wanted to limit the number of files in a directory. With a separate schema, we end up with a maximum of 1001 directory entries per directory - [x..]000 - [x..]999 files, plus 1000[0..] directory.

The problem with your suggestion is that I believe it is unintuitive to read from right to left when looking for a given id. I'd prefer to use left to right and be consistent with the other intra-dir algorithm.

I have thought about this further and it's probably not that bad to just combine the schemas. That is, drop the 1000[0..] folders and put everything in the root directory.

So to make it work as follows:

  • covers for ids < 1000 are stored as data/scenes/<id>_cover.jpg
  • covers for ids 1000 - 1999 are stored in data/scenes/001/<id>_cover.jpg
  • covers for ids 2000 - 2999 are stored in data/scenes/002/<id>_cover.jpg, and so on
  • covers for ids 998000 - 999999 are stored in data/scenes/999/<id>_cover.jpg
  • covers for ids 1000000 - 1000999 are stored in data/scenes/001/000/<id>_cover.jpg
  • covers for ids 1001000 - 1001999 are stored in data/scenes/001/001/<id>_cover.jpg and so on.

This does mean that a given subdirectory will have up to 2000 file entries.

For example, subdirectory data/scenes will have the following entries:

  • [000-999]_cover.jpg - 1000 files
  • [001-999] - 999 subdirectories

data/scenes/001 would have:

  • [1000-1999]_cover.jpg - 1000 files
  • [000-999] - 1000 subdirectories

I think this is intuitive enough, and I think a maximum of 2000 file entries is fairly reasonable.

WithoutPants avatar Apr 08 '22 22:04 WithoutPants

The last iteration looks good to me (i wouldnt expect us to need the 1000000 - ... case anyway). I would prefer instead of covers for ids < 1000 are stored as data/scenes/<id>_cover.jpg -> covers for ids < 1000 are stored as data/scenes/000/<id>_cover.jpg but that's just me.

From a quick first look found another issue. When doing a full import ( import with reset to true ) the database is reset while the data dir isnt. Since the full import will recreate the database from scratch with new ids then the data dir must be nuked.

bnkai avatar Apr 09 '22 16:04 bnkai

I might be a coder/programmer myself and because of that I'm really interested to know how is this a good change? the Migrate scene covers to filesystem. What are the pros of Migrating scene covers to filesystem instead of just saving the in the database? One that i can think of is smaller stash-go.sqlite file size or the ability to set a different drive/disk path for data folders.

TgSeed avatar Jul 15 '22 14:07 TgSeed

What are the pros of Migrating scene covers to filesystem instead of just saving the in the database?

It is likly because of issue #2271; so keeping the sqlite database size in check. Me personally prefers to have everything, that can't get recreated, in the database. Its easier to understand, to backup or migrate to a new location.

I am also not sure how much the database size is a problem for the users. SQLite can handle databases gigabytes in size, esp. in WAL mode. But thats on desktop / server, on smaller and / older devices that may be another story.

Overall, i would like to see a mixed approach in the future:

scene_cover_custom (blob) => custom image by the user => stored in the database
scene_cover_image_id (int) => cover image by selecting a image in stash
scene_cover_video_ts (float) => cover image by timestamp of the video, created cover.jpg stored in generated

JoeScylla avatar Jul 22 '22 11:07 JoeScylla

Replaced with #3187

WithoutPants avatar Nov 25 '22 06:11 WithoutPants