Cloudreve icon indicating copy to clipboard operation
Cloudreve copied to clipboard

Feature Request: Prevent Duplicate Uploads Using File Hash

Open keonramses opened this issue 1 month ago • 2 comments

Is your feature request related to a problem? Please describe.

Cloudreve currently prevents duplicate uploads only when the filename is identical. However, if a user renames a file (e.g., video1.mp4 → vacation.mp4), Cloudreve treats it as a completely new upload-even when the underlying file content is exactly the same.

This leads to:

  • Duplicate files wasting disk space
  • Confusion when organizing media libraries
  • Inefficient storage usage for users who upload large media files
  • Difficulty detecting duplicate content in multi-user environments

This is especially useful for media libraries (photos, videos, audio), where files often get renamed before uploading.

Describe the solution you'd like

I would like Cloudreve to support duplicate detection using file hashes, similar to how Immich detects duplicate photos/videos.

Proposed behavior:

  1. Cloudreve calculates a hash (MD5/SHA256/etc.) of each uploaded file.
  2. If the hash already exists in the user's storage (or globally, depending on config):
    • Cloudreve should detect the duplicate even if the filename is different.
  3. Provide a configurable action:
    • Sitewide setting (affects all users)
    • Per-user/group setting (toggle in user config)
  4. When a user tries to upload a renamed duplicate:
    • Show a prompt/options:
      • Cancel upload (recommended for saving space)
      • Proceed anyway (allow duplicate if the user really wants it)

This gives administrators/users fine-grained control over how strict duplication control should be.

Describe alternatives you've considered

  • Relying on filename matching — does not work if the file is renamed.
  • Using external deduplication at the filesystem (ZFS/Btrfs) — only helps at block level and does not prevent UI-level duplicates.
  • Manually checking duplicates — not practical for large media libraries or multi-user setups.

None of these solve the root problem inside Cloudreve’s upload logic.

Additional context

Immich and some cloud storage players (Syncthing, Seafile, Dropbox Smart Sync) already use hash-based duplicate detection to avoid wasteful uploads. For self-hosted setups with large media collections, this is extremely helpful.

This feature would:

  • Greatly reduce wasted storage
  • Improve user experience
  • Make Cloudreve more competitive for media-heavy deployments
  • Reduce backend I/O load caused by unnecessary repeated uploads

If needed, I can help test the feature or provide sample cases. Thank you.

keonramses avatar Nov 17 '25 02:11 keonramses

Ref: https://github.com/cloudreve/cloudreve/issues/2411 (in Chinese).

YUDONGLING avatar Nov 17 '25 02:11 YUDONGLING

@YUDONGLING, thank you for the reference. I was not aware of this instant transfer feature. When I made this request it was not made with this context in mind (although this would be beneficial since it would also save bandwidth and space) but rather as a way for the user to ensure that duplicates are not being uploaded in their storage. I believe this can be implemented without implementing the instant transfer feature as this may be more complicated since it would be referencing the file hash for the files stored on the entire server, rather than checking file hashes for the user only.

keonramses avatar Nov 17 '25 04:11 keonramses