fantiadl icon indicating copy to clipboard operation
fantiadl copied to clipboard

RFC: Database to store download state & metadata

Open xWTF opened this issue 1 year ago • 2 comments

Background

pixivFANBOX killed many posts recently, several creators I'm supporting moved to fantia, and there's an urgent need to implement new full-automation download & organization. Currently, fantiadl won't store the download state for each post. It sends a request for each post and each content file (even for files that already exist), which slows the download process by a lot and is not suitable for periodic cronjobs.

Solution?

This PR implements a sqlite3 database to hold the download state for fantia post and post_content entries. It also holds the state of each URL downloaded (currently for images only, since fantia uses S3 to store images and uses UUID as the image name, I assume it's safe to do so). Post contents and URLs that are present in the database will be skipped to speed up download & reduce requests sent. Posts will be marked as "complete" when all of its contents are accessible and downloaded to further reduce unnecessary requests. It also fixed tiny mistakes in the perform_download method and change incomplete_filename to full filename with the .tmp suffix, which is ignored by most sync drives by default. BTW, the database functionality is completely optional. It will be enabled only when you provide a path with the --db parameter.

Request for comment

I'm currently storing the metadata required for my automated organization solution only. Since the table structure should be relatively hard to upgrade, please comment if you consider the DB structure needs change before this PR get merged.

xWTF avatar May 27 '23 05:05 xWTF