DownloaderForReddit icon indicating copy to clipboard operation
DownloaderForReddit copied to clipboard

MD5 hash

Open mindjek07 opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe. Duplicate images

Describe the solution you'd like Store MD5 hash data of every image

mindjek07 avatar Jul 01 '22 07:07 mindjek07

I cant believe this is not what the author meant by "avoid duplicates". I ended up with tones of duplicated images simply cause they have different titles. This makes the program kind of useless for me. I hope you can add this in the future

ghost avatar Apr 18 '23 08:04 ghost

Avoid duplicates actually works by storing downloaded URLs and not re-downloading content at a URL that has previously been downloaded. It has nothing to do with the title.

This issue is not as simple as it appears. Most image/video host sites do not make an MD5 hash, or any hash for that matter, available before content is downloaded. So the content must be downloaded, then hashed, then compared to previously downloaded and hashed content, then deleted if it is found to be a duplicate. This is a feature that I plan to implement in future versions, but it is far from the ideal duplicate avoidance that most users would expect to be possible.

MalloyDelacroix avatar Apr 19 '23 15:04 MalloyDelacroix

I used to use https://github.com/shadowmoose/RedditDownloader and i'm not sure it downloads the image to know if they are in fact duplicates. Maybe it does...

Edit : Actually it does you're correct https://github.com/shadowmoose/RedditDownloader/blob/62a98c658b5759a2acdbbfa7a58cd6e842aaf71f/redditdownloader/processing/post_processing.py#L17

ghost avatar Apr 20 '23 06:04 ghost