Fix lrclib lyrics
Fixes #5102
LRCLib lyrics backend fixes
Bug Fixes
- Fixed fetching lyrics from
lrclibsource. If lyrics for a specific album, artist, and title combination are not found, the plugin now searches for the artist and title and picks the most relevant result, scoring them by- Duration similarity to the target item
- Availability of synced lyrics
- Updated the default
sourcesconfiguration to prioritizelrclibover other sources for faster and more reliable results.
Code Improvements
- Added type annotations to
fetchmethod in all backends. - Introduced
LRCLyricsandLRCLibItemclasses to encapsulate lyrics data and improve code structure. - Enhanced error handling and logging enchancements to the
LRCLibbackend. These will be added to the rest of the backends in a separate PR.
Tests
- Added new tests to cover the updated functionality and error handling scenarios.
To Do
- [x] Documentation. (If you've added a new command-line flag, for example, find the appropriate page under
docs/to describe it.) - [x] Changelog. (Add an entry to
docs/changelog.rstto the bottom of one of the lists near the top of the document.) - [x] Tests. (Very much encouraged but not strictly required.)
The build on win32 is failing to install reflink because it's only supported until Python 3.7.
I will address this in a separate PR and rebase this one accordingly once the fix is merged.
Note: this issue popped up now because I added a new requests-mock dependency which invalidated cached dependencies.
Hey, LRCLIB author here 👋. It would be great if you could make the /search API a fallback for the /get API (instead of replacing it entirely) when no result is found. The /get API is more performant on LRCLIB's side, whereas the /search API is much slower.
Also, the /get API has been updated recently, and the album_name parameter is no longer a hard requirement.
If you have any ideas for further improvements to LRCLIB's API, feel free to let me know!
Hey, LRCLIB author here 👋. It would be great if you could make the
/searchAPI a fallback for the/getAPI (instead of replacing it entirely) when no result is found. The/getAPI is more performant on LRCLIB's side, whereas the/searchAPI is much slower.Also, the
/getAPI has been updated recently, and thealbum_nameparameter is no longer a hard requirement.If you have any ideas for further improvements to LRCLIB's API, feel free to let me know!
Hi @tranxuanthang, thanks for popping in! Absolutely, that's no problem at all. This should make most lyrics queries even speedier on our side.
Now that we're on this topic, I think it may be a good idea to also add caching: for example, if we're getting lyrics for two separate files
- Artist - Title (Some Remix)
- Artist - Title
Ideally we should only ask for Artist - Title lyrics once when Artist - Title (Some Remix) is not found.
@tranxuanthang thanks for a reliable and performant API!
@snejus Will this PR also address the issue that I attempted to fix in #5089? Looks like quite a refactor, so I can't quite tell (I'm not much of a Pythoneer).
Snap, I wish I'd seen your PR earlier!
It does, however I'm not sure when will it get reviewed and merged in, so let's get your PR merged in first!
Also note that the lyrics tests / configuration issue you experienced while working on that PR has been fixed here: https://github.com/beetbox/beets/pull/5452
Okay, sounds like a plan to me!
This is probably more suitable for your other PR (#5474), but I'll mention this here. In the comments of my PR, I noted that there's opportunity for a bigger change regarding synced lyrics and fallbacks. If I recall correctly, with how things currently are, it's first come, first serve with regards to choosing lyrics from the enabled backends. In other words, if a plain-only backend fetches lyrics before we try a backend that supports synced lyrics, the plain ones will be used. That's not great if we prefer synced lyrics.
The idea for solving this that I had was:
- if synced lyrics are preferred, always query synced-supporting backends first;
- query the backends in order of preference (the order they're defined in
sourcesin the config, but synced ones first ifsynced: yes); - if a synced source returns synced lyrics, use those;
- if a synced source returns plain lyrics, check other synced sources;
- if there are no synced lyrics, just use plain ones.
Is anything like this in your plans? Thoughts on it in general? Seeing how you're working a lot on this plugin, I think you'd be well suited to tackle this.
Okay, sounds like a plan to me!
This is probably more suitable for your other PR (#5474), but I'll mention this here. In the comments of my PR, I noted that there's opportunity for a bigger change regarding synced lyrics and fallbacks. If I recall correctly, with how things currently are, it's first come, first serve with regards to choosing lyrics from the enabled backends. In other words, if a plain-only backend fetches lyrics before we try a backend that supports synced lyrics, the plain ones will be used. That's not great if we prefer synced lyrics.
The idea for solving this that I had was:
- if synced lyrics are preferred, always query synced-supporting backends first;
- query the backends in order of preference (the order they're defined in
sourcesin the config, but synced ones first ifsynced: yes);- if a synced source returns synced lyrics, use those;
- if a synced source returns plain lyrics, check other synced sources;
- if there are no synced lyrics, just use plain ones.
Is anything like this in your plans? Thoughts on it in general? Seeing how you're working a lot on this plugin, I think you'd be well suited to tackle this.
I like the concept of this! Though we need to be realistic here: lrclib is the only provider that supports synced lyrics 😅
Well at least this simplifies engineering a solution, since we never need to jump between the backends.
Now if we zoom into LRCLib, you will find that the results are currently scored this way:
@cached_property
def dist(self) -> tuple[bool, float]:
"""Distance/score of the given lyrics item.
Return a tuple with the following values:
1. Absolute difference between lyrics and target duration
2. Boolean telling whether synced lyrics are available.
Best lyrics match is the one that has the closest duration to
``target_duration`` and has synced lyrics available.
"""
return self.duration_dist, not self.synced
It prioritizes lyrics that have duration closer to the music file. Presence of synced lyrics comes secondary.
Though now that I think of this, I think we may be able to swap these two around:
return not self.synced, self.duration_dist
And thus prioritize results that have synced lyrics available. I guess the danger is what if we receive some wrong synced lyrics where the duration is vastly different?.
Results that have mismatching durations actually get ignored early on as the algorithm is fairly strict about what's allowed: it tolerates ±5% difference:
DURATION_DIFF_TOLERANCE = 0.05
@cached_property
def is_valid(self) -> bool:
"""Return whether the lyrics item is valid.
Lyrics duration must be within the tolerance defined by
:attr:`DURATION_DIFF_TOLERANCE`.
"""
return (
abs(self.duration - self.target_duration)
<= self.target_duration * self.DURATION_DIFF_TOLERANCE
)
This means that when we calculate dist above, the result duration is guaranteed to be
acceptable - so we may be able to get away with prioritising results that have synced lyrics from that point on 😉
From what I've seen so far the change above works very well:
I'm also seeing that this fixes some of the incorrect lyrics that the LRCLib /get endpoint returned.
I like the concept of this! Though we need to be realistic here: lrclib is the only provider that supports synced lyrics 😅
So far. 🙃
Regarding the algorithm, I think if you first disqualify matches that are too short/long and then rank by synced-ness, that makes sense. That's all good.
The issue I'm thinking of is more for when there are multiple synced lyric providers. Imagine there's LCRLib and bilRCL, both can provide synced lyrics. As it is now, if I understand it correctly, if LRCLib is first in the sources list and only has plain lyrics for a song, the plugin will apply the plain lyrics, despite the fact that bilRCL might have synced lyrics. What I'm suggesting is to have the approach you have made here for LCRLib and expand it for all backends.
- If you want synced lyrics, and the first source that supports synced lyrics has them, use them (in all cases - if they match the duration and other parameters).
- If the first source only has plain lyrics, check the next source that supports lyrics.
- If no synced source has synced lyrics, use the plain lyrics from the first source.
- Sources are checked in order of preference as specified by the order of the
sourcesconfig field.
The issue I'm thinking of is more for when there are multiple synced lyric providers. Imagine there's LCRLib and bilRCL, both can provide synced lyrics. As it is now, if I understand it correctly, if LRCLib is first in the
sourceslist and only has plain lyrics for a song, the plugin will apply the plain lyrics, despite the fact that bilRCL might have synced lyrics.
I see! You're correct, the plugin now picks any kind of lyrics returned by the first source that returns something.
What I'm suggesting is to have the approach you have made here for LCRLib and expand it for all backends.
This makes sense - I am myself after synced lyrics if possible, so if there was another source that provides them, I would indeed try to engineer a way to retrieve them.
- If you want synced lyrics, and the first source that supports synced lyrics has them, use them (in all cases - if they match the duration and other parameters).
- If the first source only has plain lyrics, check the next source that supports lyrics.
- If no synced source has synced lyrics, use the plain lyrics from the first source.
- Sources are checked in order of preference as specified by the order of the
sourcesconfig field.
This looks like a good approach. This of course depends on knowing in advance which sources provide synced lyrics, so that we can try them first.
- If no synced source has synced lyrics, use the plain lyrics from the first source.
I'd say we may even want to have a separate source priority list for non-synced lyrics: for example, I found that I prefer lyrics from LRCLib only if they are synced. Otherwise, I'd rather get them from Genius where they've been reviewed by the community.
Alternatively, if users want synced lyrics, we firstly query backends that provide them (respecting their order in the sources configuration), and then query the rest if synced weren't found. This way, if I have the following configuration:
lyrics:
sources: [genius, lrclib, tekstowo, bilrcl]
synced: yes
The plugin tries LRCLib and bilRCL first and returns the first valid synced lyrics, if found. If not, it tries Genius, then (cached) plain lyrics from LRCLib, then Tekstowo, and finally bilRCL.
I think your suggestions make perfect sense! If you were to implement it like that, I wouldn't be mad at all. Though this probably warrants a bigger discussion with more maintainers, I would imagine. At any rate, I'll leave this idea with you, since you're working a lot on this plugin and you're way better suited for the task, if you choose to accept it, than I am.
@edgars-supe for now I will keep it as it is, but I noted this idea for the future when another source providing synchronised lyrics pops up!