pypath icon indicating copy to clipboard operation
pypath copied to clipboard

Maintain a record of download dates and URLs in cache

Open deeenes opened this issue 5 years ago • 3 comments

Add a database (tsv, json or sqlite) to the cache where we maintain a record of the last download date and the URL of each file in the cache. The resources can carry this information in pypath and we should provide it in the web service.

deeenes avatar Oct 03 '19 16:10 deeenes

Hi @deeenes and @Nic-Nic,

I want to learn more pypath. The best way to learn it is to dig out the source code and contribute. Could you help me to understand how to tackle cache? I am happy to develop this request if you don't have doubt.

Regards, Tung

ntung avatar Feb 21 '20 01:02 ntung

Hi Tung,

Thanks, this is splendid idea and very nice from you!

Briefly, the cache by default is in ~/.pypath/cache and the names of the cache files is an md5sum of the URL and the GET and POST parameters. It is generated by the pypath.share.curl.Curl class. This class looks ugly and messy as we wrote it very long time ago. I think is difficult for a new person to read its code but not impossible. The Curl class check if the cache use is enabled and if the cache file exists, if so it loads the file from the cache, otherwise it downloads and saves the data to the cache. We could add method to maintain a json file in the cache which for each key could contain the last download date and maybe other meta information, and then methods to other parts of the module to interact with this data, e.g. the web service tables could contain the download date for each database.

This is just a quick overview, I am happy to answer if you have more questions.

Best,

Denes

deeenes avatar Feb 21 '20 02:02 deeenes

Hi @deeenes,

Thank you for giving a brief explanation!

I will look into that class and might have some questions.

Best, Tung

ntung avatar Feb 21 '20 18:02 ntung