pymaid
pymaid copied to clipboard
Improvement: write cache to disk
Instead of keeping cached responses in memory, we could use a persistent, dictionary-like object saved to disk via e.g. the shelve module.
Need to figure out how this could work with temporary file to make sure it's deleted upon exit.
Diskcache seems to be suitable contender.
I've been using diskcache a bit recently, seems to work! diskcache is also thread-safe, where I think the current cache isn't (got an exception earlier from what I believe was trying to clear some cache space while iterating over it in another thread).
Looping back to this: requests-cache seems ideal. You just wrap your requests.Session with a requests_cache.CachedSession (which can then be wrapped by a requests_futures.FuturesSession), which has various backends including a dict-based one like the existing implementation, and SQLite, which can be in-memory or persist to a file.
By default it only caches code 200 responses, which means we probably wouldn't need cache.undo_on_error any more, and automatically handles expiry etc.. It can also scrub auth headers from the stored requests. It doesn't handle restricting cache size, but that's probably less of a problem if it's writing to disk, and you can set a filter function which would prevent it from caching large data anyway (e.g. images). It may allow cutting down a lot of the current caching code.