python-scraperlib icon indicating copy to clipboard operation
python-scraperlib copied to clipboard

Add S3 based optimization cache support

Open satyamtg opened this issue 5 years ago • 1 comments

We use kiwix_storagelib for implementing S3 based optimization cache in the scrapers. However, this gives rise to redundant code. We put a version of the file along with the optimizer version as the metadata always. So, this can be better implemented in scraperlib. For a start, we can have a caching module that can have 3 functions, (or maybe a class containing methods). The primary 3 things we need are -

  • download_from_cache()
  • upload_to_cache()
  • check_credentials()

There can be several ways to have this, but it should at least fulfill the following -

  • Compare optimizer_version
  • Compare file_version

Optional things can be to check file upload date and discard if it's older than a specified amount of time. If we go for a class based approach, we can also explore possibilities to improve performance.

satyamtg avatar Jul 08 '20 10:07 satyamtg

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

stale[bot] avatar Sep 06 '20 11:09 stale[bot]