bulk-downloader-for-reddit icon indicating copy to clipboard operation
bulk-downloader-for-reddit copied to clipboard

[FEATURE] Stop download when first downloaded ID found?

Open Cassian-Andor opened this issue 1 year ago • 0 comments

  • [ x ] I am requesting a feature.
  • [ x ] I am running the latest version of BDfR
  • [ x ] I have read the Opening an issue

Description

I am currently setting up a way to download a given subreddit, and then subsequently download newer posts every once in a while (let's say weekly). I know I can pass a limit option, but this is not deterministic, some subreddits have more posts than others and then I could potentially pass a "10 items" limit and then the subreddit had 15 new posts since I last checked and I miss 5. Then I could increase the limit but there is never certainty.

Since there is no way to specify a "from date" (and I have read issues where people suggested it before but was dismissed) in a way that I could say "download everything since the last date I ran this", I was wondering if there is a way to add a flag that stops downloading after it finds an ID to skip?

My idea is that if I have a file downloaded_ids like this

ID_10
ID_09
ID_08
ID_07
ID_06

And then I have a subreddit like this (sorted by date)

ID_13
ID_12
ID_11
ID_10
...

When I get to ID_10 then I could pass a flag --stop after first id found and then the program would stop, because there is no point in processing ID_10, ID_09, etc. All of these were already downloaded, so if I always download sorted by date, what's the point of re-processing everything. Even if the API returns 1000 items in memory.

My problem is that it takes forever to run for each subreddit. For example youtube-downloader (yt-dlp) has the same concept of skip IDs, and it takes 5 seconds to skip a channel with hundreds of videos. BDFR on the other hand takes forever to skip a single subreddit.

This is what I currently use to download in case there is a way to do what I want to without adding a new flag. The whole rm -f of log files is because it bothers me to have N log files per subreddit, I don't need those.

#!/bin/bash

download_subreddit () {
	rm -f "./${1}_log.txt".*
	rm -f "./${1}_log.txt"
	bdfr download "./${1}" --log "./${1}_log.txt" --exclude-id-file "./${1}_ids.txt" --no-dupes --submitted --file-scheme "{REDDITOR}_{TITLE}_{POSTID}" --folder-scheme "" --subreddit "${1}"
	./extract_successful_ids.sh "./${1}_log.txt" >> "./${1}_ids.txt"
	rm -f "./${1}_log.txt".* 
}

download_subreddit <SUBREDDIT_NAME>

Cassian-Andor avatar Apr 19 '23 11:04 Cassian-Andor