Cache everything
This issue connects #316, #445, #522 and maybe #348
Now that we're in the process of making iterative improvements, one big one which I feel is probably has the most impact on user experience is caching all of the requests we make. This means caching playlists, first and foremost, but along with saved trucks, saved albums, and other lists. I think this should be our next big feature push. @SO9010
However, we'll need some mechanism of invalidating these caches if the playlist has changed and we need query to get the new state.
Somethings won't change (most likely):
- Album covers for albums (not daylists/mixes)
- Lyrics
- User's list of playlist
We can add a manual shortcut to refresh (or maybe an icon as well) if they need to be queried again.
For the rest of the queries that we make I suggest we:
- Have their cache invalidate after 1 hour since refresh
- Allow this duration to be configured in the settings (so some users who only want to manually refresh for changes can do so)
- Keep track of all changes made from the client side (so there's no desync)
Of particular note, we will have to write some custom async function to allow the user to query all of their saved songs. My suggestion here would be that if a user goes to their saved tracks and nothing has been cached yet, we add a button that allows them to do a full retrieval of their saved songs. This would be a somewhat long async task, but would run in the background and loop over the max number of items we can retrieve from users like to songs with the API. We would need some kind of indication somewhere that these tracks are being retrieved and when they finish being retrieved. I'm not sure if that is a modal or if that is some UI element, but since it's such a long task, it might be worthwhile implementing for users with large libraries.
This will probably require some considerable reworking of most of the requests that we make. But as long as we wrap these requests around a singular function that can handle the storage, retrieval, and validation in a unified way, it should be very nice. I also believe this will give us an opportunity to consider how the cache is stored and whether we can make improvements.
@jacksongoode This sounds great so far!
Have you considered integrating lazy loading and virtualized scrolling to reduce the number of tracks that need to be loaded and cached? This would improve the program's load time and reduce cache size.
I believe this would require changes to the queue system, but it would allow users to load an endless number of tracks. We could also add a setting for users to define how many pages are fetched at once.
Additionally, we could introduce a priority-based caching system, where larger, long-term caches are used for saved albums and playlists.
For playlists, we could display an older cached version while requesting the latest from Spotify. If changes are detected, we could download the first 50 tracks and progressively cache the rest.
We could get users to set which playlists they want to have prioritised too.
So I think the key points that we address should include:
- Cache Essentials:
Cache playlists, saved tracks, and albums. Use manual refresh options (e.g., a button or icon) to update data when necessary.
- Cache Expiry and Flexibility:
Set default cache expiry at 1 hour, and then we could have a comparative check to see if anything has changed, but make this duration configurable in the app's settings for users who prefer manual refreshes.
- Long Async Queries:
For longer tasks, like retrieving a user's saved songs, run background async tasks and provide a UI indication (e.g., a modal or progress bar) to show when this process is complete. Or point 4.
- Lazy Loading and Virtualized Scrolling:
To improve loading speed and minimize cache size, implement lazy loading and virtualized scrolling, allowing users to progressively load large lists of tracks.
- Queue System and Progressive Updates:
For playlists, display the cached version first and asynchronously check for updates. If changes are detected, refresh the first 50 tracks and progressively update the rest.
As someone who's implemented generic caching for Spotify at the request level for a Python project, maybe my experience is helpful. I really regret doing it this way, it was easier and it does work but it would have been better to cache at the item level, (or maybe a hybrid). The way I did it, I've got multiple copies of track data depending on the context e.g. PlaylistTrackObject, SimplifiedTrackObject, TrackObject and then also depending on how it was requested (single item, multiple grouped together etc). Mostly I respect Spotify's cache headers, sometimes I use them as a guide, but then other times i ignore them entirely (I consider images and basic track metadata as static, while playlists must be dynamic).
If/when I redo it, I'll organise it around the items rather than the requests, with essentially a database of Spotify URIs. For what it's worth, I wouldn't be against writing (or helping to write) such a thing in Rust since using Rust from Python is supposedly pretty straightforward these days.
I done some looking at how the spotify API does it. So what it first does is it loads only the first 25, which is probably done just so that the user can decide whether or not they want to listen to the song or not, then it grabs the rest of the song.
We could perhaps also use the same API call that spotfy uses as we have that essentailly set up already for the home section. (We should probably add a enum for what type of api call we want to use).
The call that it uses initally is:
Base URL: https://api-partner.spotify.com/pathfinder/v1/query? Operation Name: fetchPlaylist
Variables
- uri: spotify:playlist:37i9dQZF1DZ06evO1DHoaY (this is just an example)
- offset: 0
- limit: 25
Extension
persited query
The next call is:
This is the same as before just offset: 25 Limit: 100
This would greatyly reduce the amount of time that it takes to request the data due to having a larger return window. See here to see the json return of the last call
@kingosticks Thank you! I'm pretty sure that PSST already does what you're suggesting. Currently, there are 4 main directories in the cache folders: Track, Episode, Audio and Key. When the user plays a track from any playlist or album it loads the track using this function:
fn load_track(
item_id: ItemId,
session: &SessionService,
cache: &CacheHandle,
) -> Result<Track, Error> {
if let Some(cached_track) = cache.get_track(item_id) {
Ok(cached_track)
} else {
let track = Track::fetch(session, item_id)?;
if let Err(err) = cache.save_track(item_id, &track) {
log::warn!("failed to save track to cache: {:?}", err);
}
Ok(track)
}
}
This means that there is only one copy of the track data for every playlist and album. Please correct me if this system is not what you were suggesting :)
So, unfortunately, I am unable to get the API which Spotify uses to work properly; it just returns nothing no matter what I try!
Also, we should add a cache to say what the offset the user is at or the scroll location for the user. This will make going back through history way more excellent, particularly notable for the show lyric button.
This means that there is only one copy of the track data for every playlist and album. Please correct me if this system is not what you were suggesting :)
Yes! I had problems when I initially tried to implement that approach because when I came to load an album, or particularly a long playlist, I could end up with many single track requests and would hit Spotify's rate-limiting. So I had to group track requests together and then ungroup the responses afterwards, which is doable. And then different endpoints return different views of the tracks so I had to check if the cached track data I already had (maybe from their /playlist endpoint) was actually sufficient for what I now needed (maybe more detailed info from their /tracks endpoint). So those multiple views will likely then have multiple expiry times that need taking into account. And when requests fail, do you invalidate all item views, or just the one you were updating? How would I go about evicting old data if I wasn't sure what it was being used for, must I track that also, and to that per view also? For getting playlists-with-tracks, this would sometimes end up being more requests than just getting everything from the playlist endpoint in the first place. It was getting complicated and very Spotify-specific (which at the time I wanted to avoid since I have other APIs to use) so I just tossed it all and went with a far simpler request-based caching approach. And here I am regretting it. I didn't even bother implementing cache eviction in the end, nobody has complained as yet about that....
But it sounds like you've managed to avoid my issues (real or just in my head). But I am curious how you avoid the rate-limiting when requesting just single tracks.
I'm wondering if you requested all of the song data with the playlist. I'm pretty sure PSST doesn't do this but only loads two of three songs ahead of the queue. This would be where the issue that you are facing comes from. From.
PSST currently invalidates just the song which has failed to load as this is separate from the UI aspect.
Sorry if I didn't understand your questions!
No, not at all. I just wanted to provide my experience in case it helped form what you were doing. Sorry for the noise.
Oh! Thank you very much, I'm sorry that I misinterpreted you! :P
Also relevant is #501
@jacksongoode we should start moving on this more soon. Any more ideas?
Thoughts on optimizing playback
So psst does the shuffling in psst core by randomizing positions of songs. But the whole Vec of songs needs to exist. But I think that is what needs to change.
We could possibly change our approach to this where we parse though the number of tracks to shuffle.
The most compelling idea to me is that we should already have the positions of songs calculated then we parse through those songs.
So for example if we have a a playlist of 232 songs, we take that. We want to shuffle play, the first like
5 songs to be played are chosen 3, 18, 200, 4, 122. We then get those numbers and then feed them to the playback function. For playing then sequencely we do similar. This can of course be done with Spotifys paging API. However this means the queue system is no longer in psst core which renders psst CLI dead.
Or we could have a hashmap of all of those songs but rather than loading all of them we have a variable PotentialSong which has a method which we can call to get the ID. This way we can order then, shuffle them and do what ever in a similar manner to how it's already done. I don't mind this at all. In potential song we have where the request to get the song will be. This would also mean that we could rewrite the origin detection to be based off this potentional song variable which we could return using a function in psst core. then addressing #583
Edit: I think this one makes the most sense. Also because the. We can invalidate playlist caches through number of songs.
Lazy loading thoughts
I wander how we could do lazy loading with scrolling too. We would probably also want to unrender some when scrolled down a certain amount to conserve RAM.
We probably need another variable on the playlist page which contains page number. Then we need to modify our page loading variable to take an initial offset.
Edit: We should mix this with the API that Spotify uses.
@jacksongoode what do you think?