gonic icon indicating copy to clipboard operation
gonic copied to clipboard

Request: use clustered directory structure for cache

Open hak0 opened this issue 11 months ago • 1 comments

Thank you once again for creating such an excellent piece of software! I truly appreciate the effort and thought that has gone into its development. I do, however, have a question regarding the indexing performance of cached transcoded files.

Currently, all transcoded files are placed in the transcoding folder:

  • album covers are saved in transcoding/covers
  • songs and podcasts are saved in transcoding/audio

But they are all stored in the root directory, and as the number of files increases, the read performance will degrade because of the limit of the filesystem. I think we can add a folder structure to mitigate this problem.

For album arts like al-1234-512.png, we can extract the prefix al-12 as the folder name, so the cached files are saved in al-12/al-1234-512:

(In handlers_raw.go)

	cachePath := filepath.Join(
		c.cacheCoverPath,
		id.String()[:5], // Use the first 5 characters as the subfolder name
		fmt.Sprintf("%s-%d.%s", id.String(), size, coverCacheFormat),
	)

Similarly, for transcoded songs like 000e4dc149737ce71867ad753bd2bb79, we can use the prefix 00 as the first-level folder name, and 0e as the second-level folder name. Then we will have a two-level tree structure that can hold 256*256*256=16777216 transcoded songs without performance degrade (each folder contains <256 entries on average).

(In transcoder_caching.go)

path := filepath.Join(
    t.cachePath,          // Base path
    key[:2],              // First-level directory (first 2 characters of `key`)
    key[2:4],             // Second-level directory (next 2 characters of `key`)
    key,                  // Actual file name
)

I would greatly appreciate it if you could consider my suggestion. Thank you for your time and attention!

hak0 avatar Jan 18 '25 01:01 hak0

thank you for the nice words! this makes a lot of sense to me

should be an easy enough change to implement

though doing it a backwards compatible way might be tricky (to not invalidate peoples' large caches) we could

  • check the old and new path format when finding a cache hit
  • or, write a migration to move all the files to the new format on startup

sentriz avatar Jan 18 '25 03:01 sentriz