untangle lru_cache and persistent cache
The current lru_cache code is complicated because it mixes several intentions. Let's untangle them.
The two goals are 1) in-memory memoization layer to reduce remote data fetches; 2) persistent caching layer, primarily so that tests do not require network access.
Consequences of mixing these concerns are:
- Can't use included lru_cache code (incl. in 3.x)
- Configuration is confusing. uta connect() requires a cache mode that's different than the lru_cache mode, and neither checks whether the supplied value is legit.
- As implemented, the hdp interface is also entangled in caching.
Outcomes that I'd like to see:
interfacemodule should be about the interface only. It shouldn't mix caching, etc.- use existing caching and persistence tools where possible
- separate caching from the actual data provider (uta, cdot, etc)
- portable cache file format (across Python versions and platforms)
- enable caches to be used as sole-source for data for testing
Also: Investigate whether we can pin the pickle protocol version to 2 so that the same cache works for Python 2 and 3.
Outcomes that I'd like to see:
interfacemodule should be about the interface only. It shouldn't mix caching, etc.- use existing caching and persistence tools where possible
- separate caching from the actual data provider (current implementation achieves this)
- same cache file for all Python versions
- enable caches to be used as sole-source for data
- support using NCBI gff3 alignment files directly
Ideas:
- Provide distinct LRU (memory) and PersistentCache classes that implement the interface. Use these to provide layers, where nested layers are invoked for cache misses.
- Stick to the mix-in model, where sequence fetching, gene data, and transcript data might come from pluggable sources
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.