beets
beets copied to clipboard
Querying improvements / removing 'slow' queries
Description
Fixes #4360 #3515.
This is a WIP pull request to act as a basis for further discussions regarding improvements to queries. See #4360 for more details. There may be more issues that have to do with this functionality that I am unaware of, so feel free to link any relevant discussions.
What's been done at this point:
- [x] Faster regex queries,
beet list album::album
- Added db function that does matching directly
- [x] Faster album path queries,
beet list -a path::some/path
- Track-level metadata is queried directly through a join
- [x] Faster flexible attributes queries, both albums and tracks,
beet list play_count:10
- The corresponding attribute table is also immediately accessible through a join
- [x] (Fast) Ability to list albums with track-level (and vice-versa) db field queries,
beet list -a title:something
-
items
andalbums
tables are joined so that data is available for filtering
-
- [x] (Fast) Ability to list tracks with album-level flexible field queries,
beet list artpath:cover
- Since track data contains
album_id
, it's fairly straightforward to joinalbum_attributes
into track queries
- Since track data contains
- [x] Pass current tests
- Except 3
lslimit
plugin tests that have to do with the '<' query prefix. I had a look at its implementation and found that it's fairly fragile and complicated to adjust. Therefore, for now I added a flag-l --limit
which accepts a value for the limit. See 734883fa. If it looks sensible I reckon it could be moved to another PR. -
bareasc
andplaylist
plugins got slightly adjusted to enable fast db-level queries as well
- Except 3
To Do
- [x] (Fast) Ability to list albums with track-level flexible field queries,
beet list -a title:something
- It's kind of done but it caused a couple of tests to fail, so I left it out for now
- [ ] Documentation. (If you've add a new command-line flag, for example, find the appropriate page under
docs/
to describe it.) - [ ] Changelog. (Add an entry to
docs/changelog.rst
near the top of the document.) - [x] Tests. (Encouraged but not strictly required.)
@wisp3rwind Your comments have been addressed. I have also added some explanatory comments to the _fetch
method, hopefully they will help to clarify what's going on in there!
Furthermore, now albums can be filtered using track-level queries.
Included a few simple tests as well. Except for the changelog entry, the functionality is complete at this point, so I'm marking the PR as ready for review. @sampsyo we'll need your eyes on this :)
Hi @snejus, looking great! I have essentially no time to review this (or anything for beets) at the moment, so it will take me some time to get around to it. It's on top of my beets todo list!
For now, I've only looked at the lslimit
changes. It's great to get rid of the brittle implementation in the HeadQuery
and use SQL LIMIT
instead, we should absolutely do this!
How exactly are the tests failing?
If we can't fix them easily by changes local to the limit
plugin, I think we should still resolve this issue in one way or the other now: Otherwise, we risk to "temporarily" disable the <
query prefix, and never add it again. Two approaches could be
-
What you did here, but make it permanent instead of considering it to be a stopgap solution: That is, remove the
<
query prefix entirely, and add a--limit
argument to relevant commands. The downside is that we'd usually want to mark the prefix as deprecated for a while and issue a warning, before finally removing it. Maybe, we could at least turn theHeadQuery
into what is essentially aTrueQuery
, but displays a warning on first invocation? Doing this would still require appropriate communication in the changelog, etc., since we'd nevertheless have to remove the prefix on short notice. Would that be acceptable? -
We come up with a way to keep the
<
prefix working, which will involve supporting it in query parsing indbcore.queryparse.parse_sorted_query
. This function would need to return something like(query, sort, limit)
instead of the current(query, sort)
. One question is whether to enable this by default, or still use thelimit
"plugin" to enable this prefix, in order to keep exact backwards compatibility.
@sampsyo (or anyone!) do you have any strong opinion here? This is only about commit 1 and 3 (734883fa9af0320b1f00bc97a286a3d3da8df803 and b9666ee9977d148ba6c467388414b21b9380d3d2) of this PR.
In any case, would it be possible for you (once we have decided what to do exactly) to split this into a separate preparatory PR, and rebase this PR on top of it? I think that would simplify review a bit.
Hi @wisp3rwind, thanks for your response! See below for the failed tests:
FAILED test/test_limit.py::LimitPluginTest::test_prefix - AssertionError: 10 != 5
FAILED test/test_limit.py::LimitPluginTest::test_prefix_when_correctly_ordered - AssertionError: 10 != 5
FAILED test/test_limit.py::LimitPluginTest::test_prefix_when_incorrectly_ordred - AssertionError: 10 != 0
Let me know if you want to know the specifics of why it is the case.
Regarding your suggestions,
-
This would probably not be ideal, though luckily this plugin has not been released yet - it got added a month later, after
1.6.0
got released, therefore a deprecation notice may not be needed. However, we also know that a fair bit of users install themaster
branch, so this change would be relevant to them. -
I'd be very happy to go ahead with this solution. The implementation, as you suggested, is fairly straightforward. From the user's perspective, ability to limit the results using the same
list
command is great. Personally, my main use case is- I find that a query yields too many results
- Press up
- Append
-l10
- Press enter
Some users may have already gotten used to be using the prefix, so supporting it going forward makes sense.
Regarding the lslimit
command, we could maybe print a notice like Limiting is now supported in the list command, please use it instead
before we release the next beets version. We could remove this when it's about to be released, since limit
functionality didn't exist in 1.6.0
anyways.
Let me know your thoughts.
To confirm, I'll be happy to open a separate pull request for this.
@snejus This looks useful for something that I using. If I am using something like query = MatchQuery("plex_ratingkey", track.ratingKey, fast=False)
to match on the flexible attribute plex_ratingkey
, will it work as is? Right now, the query is slow, which I am hoping your PR seeks to address.
@arsaboo it should indeed work.
You may as well try checking out this PR and attempt to, say, list tracks using your filter: beet ls plex_ratingkey:xyz
. I'd be happy to hear whether there's any change in performance :)
Alright, sorry for the long silence, but I'll have a little more time in the next weeks :)
Let me know if you want to know the specifics of why it is the case.
Ah, ok, the entire FieldQuery.match ->
*Query.value_match` code path is dead now, isn't it, since it all pushed down to SQL?
I had completely missed that lslimit
has not been released; so let's just go ahead, remove it and add the functionality to beet's core :)
A deprecation notice on loading the plugin would indeed be nice. We already have (at least) two plugins that do so; see gmusic.py
and freedesktop.py
.
There's in fact a rationale for keeping the query prefix, see the docs https://github.com/beetbox/beets/pull/4190/files What's more, LIMIT
ing properly would also solve the issue that's mentioned there about query ordering.
About the implementation, a few initial thoughts:
-
--limit/--head
: Easy enough, just factor out the limit keyword argument and implementation from this PR. -
--tail
: Seems more complicated: This makes little sense unless a sort other than the defaultNullSort
is used. Then, we would need to reverse the sort in the SQL,LIMIT
the results, and then reverse the result set again. Reversing the SQL seems somewhat tricky: maybe it could be done by adding areverse
kwargs todbcore.query.*Sort.order_clause
. Or, is there an easier option to formulate this directly in SQL? @patrick-nicholson how important is the--tail
command to your usecase? Maybe, it would be an option not to implement it for now to limit the complexity of the PR. - the query prefix: I guess we would need a notion of "query modifiers" or "query options", such as a limit, that would be returned by
dbcore.queryparse.parse_sorted_query
.
A semi-related issue is https://github.com/beetbox/beets/issues/750
@wisp3rwind I don't think --tail
is all that necessary (at least on *nix) if sorting is correct, --head
results can be reversed.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Thanks for a reminder, will try reviving it in the coming days.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
We should keep this active
I've been intending to come back to this but have been too busy recently. This PR crosses my mind every weekend and I know I will be coming back to this soon :)
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
We should keep this active....