tumblr-crawler
tumblr-crawler copied to clipboard
Adding timestamp to tag the start and end time of the downloading
建议增加两个时间戳,一个用于下载Post开始日期,一个用于下载Post结束日期。 比如:在2017年3月1日下载过某Tumblr博主的所有Posts(51GiB),并且保留了少数图片/视频(1GiB),现在要下载2017年3月1日到2017年4月1日之间的Posts,但是2017年3月1日之前的Posts不想再重复下载(毕竟磁盘空间有限)。
Well, it's a good idea.
Currently, the script will not re-download the files if they exist. But for those deleted files, it will download again.
-
A good convenient way to implement this scenario is to add a file which contains the timestamps in the corresponding folders after finishing downloading. You may delete some files that you don't like. When you want to download again for newer posts, it will start downloading from the last finished timestamp if the timestamp file exists. While those deleted files will not be downloaded. If the timestamp file does not exist or you delete that timestamp file, it will behave like ever before.
-
But if you want to pass the time interval to the script before downloading, it is not an easy way. The current usage has to be changed, as well as the arguments parsing.