BASC-Archiver
BASC-Archiver copied to clipboard
[Suggestion] Make a list of already downloaded files in a thread so as not to download them again
I routinely run a dupe check, which has once freed up to 9GBs, and it is weird that the archiver can't detect that.
That's odd. My best guess is that it could be images in different threads that are the same. On 4chan, it obviously doesn't let you repost the same image, however with archiving you can have 40 copies of the same image because it was reposted in 40 threads on different days.
This is an interesting thing to think about, whether it's worth looking into something along the lines of (hard) symbolic links or something similar, will need us to store a list of files and at 1/2 hashes of them. Will definitely look into it, thanks for making the issue!
HasJ, I'm interested in that dupe check. Are you using md5s and looping through each file, deleting matches?
@jcook14 Exactly and deleting the oldest matches, using an old but goody app, DoubleKiller.
@HASJ Awesome app, thanks! I will integrate this into my board ripper - I am curious if you have addressed re-linking the deleted thumbs/pics in the relevant thread html file (if you keep the markup structure that is).