youtube-sync
youtube-sync copied to clipboard
Implement deduplication
While any video belongs to exactly one channel, we do support playlists and thus can have a lot of cases where a video belongs to several different profiles. However, each profile should be able to define a format (see #11), so it's not enough to just check for identical video ids, because they might impose different requirements upon the to-be-stored data.
One way to deal with this issue would be to only deduplicate if the format is identical - which might be good enough, seeing as "maximum video and audio quality" and perhaps separate audio-only profiles are the most likely use-case for this project.
Also, there are multiple ways of deduplicating.
- The easiest would be to just store one real copy and symlink (or hardlink) everything else. Downside: Simply backing up a profile directory might not backup 100 % of the data because duplicate videos might be stored outside of it.
- A more complex and less portable way would be to use file system features. For example, btrfs is CoW and we could use the
FIDEDUPERANGEioctl to save disk space. - ?
Downside: Simply backing up a profile directory might not backup 100 % of the data because duplicate videos might be stored outside of it.
This doesn't hold true for the hardlink case. Unless there's a better idea, my preference is supporting
- CoW-copies
- hardlinks
- symlinks
in descending order of preference, depending on which of those are supported by the filesystem and which ones we have the permissions for.