listenbrainz-server
listenbrainz-server copied to clipboard
Spotify metadata cache (WIP)
This PR is very much a work in progress, but I could use some head-checks to see if anyone sees anything badly wrong with my approach for building a spotify metadata cache.
The idea is that to warm up this giant cache, we're going to seed all the known spotify artist ids as a set of "pending ids" to be fetched. As these artists are fetched and stored in couchdb, we discover new artist ids and then add them to the list of pending ids. Once we've collected 1000 pending ids, these ids are written to a special document in couchdb.
This program, which will run indefinitely, has a queue of what artists to look up next. If that runs dry, the next pending ids document is fetched from couchdb, inserted into the internal queue and then deleted from the db. The main loop of the program will continually fetch artists (by fetching all fo their albums and their tracks) and then write the results to couchdb.
In order to keep the pending_ids from exploding, there is a recent_ids dict that tracks ids that have either been recently added or recently added to the pending list, so that we can forget about it for the time being. RIght now the recently added dict has no pruning in place, so this is something to keep an eye on.
Actually using this data is beyond the scope of this PR. We have several months before we've collected enough data.
TODO: Clean up couchdb additions, docstrings, add more gutt to README.
Hello @mayhem! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
- In the file
listenbrainz/db/couchdb.py
:
Line 135:1: E302 expected 2 blank lines, found 1 Line 158:1: E302 expected 2 blank lines, found 1
- In the file
listenbrainz/mbid_mapping/mapping/spotify_cache.py
:
Line 23:12: E713 test for membership should be 'not in'