youtube-sync icon indicating copy to clipboard operation
youtube-sync copied to clipboard

Implement deduplication

Open PotcFdk opened this issue 6 years ago • 1 comments

While any video belongs to exactly one channel, we do support playlists and thus can have a lot of cases where a video belongs to several different profiles. However, each profile should be able to define a format (see #11), so it's not enough to just check for identical video ids, because they might impose different requirements upon the to-be-stored data.

One way to deal with this issue would be to only deduplicate if the format is identical - which might be good enough, seeing as "maximum video and audio quality" and perhaps separate audio-only profiles are the most likely use-case for this project.

Also, there are multiple ways of deduplicating.

  • The easiest would be to just store one real copy and symlink (or hardlink) everything else. Downside: Simply backing up a profile directory might not backup 100 % of the data because duplicate videos might be stored outside of it.
  • A more complex and less portable way would be to use file system features. For example, btrfs is CoW and we could use the FIDEDUPERANGE ioctl to save disk space.
  • ?

PotcFdk avatar Apr 29 '19 21:04 PotcFdk

Downside: Simply backing up a profile directory might not backup 100 % of the data because duplicate videos might be stored outside of it.

This doesn't hold true for the hardlink case. Unless there's a better idea, my preference is supporting

  1. CoW-copies
  2. hardlinks
  3. symlinks

in descending order of preference, depending on which of those are supported by the filesystem and which ones we have the permissions for.

PotcFdk avatar Apr 29 '19 21:04 PotcFdk