pymaid icon indicating copy to clipboard operation
pymaid copied to clipboard

Improvement: write cache to disk

Open schlegelp opened this issue 5 years ago • 3 comments

Instead of keeping cached responses in memory, we could use a persistent, dictionary-like object saved to disk via e.g. the shelve module.

Need to figure out how this could work with temporary file to make sure it's deleted upon exit.

schlegelp avatar Jan 10 '20 10:01 schlegelp

Diskcache seems to be suitable contender.

schlegelp avatar Jan 10 '20 11:01 schlegelp

I've been using diskcache a bit recently, seems to work! diskcache is also thread-safe, where I think the current cache isn't (got an exception earlier from what I believe was trying to clear some cache space while iterating over it in another thread).

clbarnes avatar Nov 07 '22 18:11 clbarnes

Looping back to this: requests-cache seems ideal. You just wrap your requests.Session with a requests_cache.CachedSession (which can then be wrapped by a requests_futures.FuturesSession), which has various backends including a dict-based one like the existing implementation, and SQLite, which can be in-memory or persist to a file.

By default it only caches code 200 responses, which means we probably wouldn't need cache.undo_on_error any more, and automatically handles expiry etc.. It can also scrub auth headers from the stored requests. It doesn't handle restricting cache size, but that's probably less of a problem if it's writing to disk, and you can set a filter function which would prevent it from caching large data anyway (e.g. images). It may allow cutting down a lot of the current caching code.

clbarnes avatar Mar 17 '23 18:03 clbarnes