beets Querying improvements / removing 'slow' queries

Querying improvements / removing 'slow' queries

Open snejus opened this issue 2 years ago • 17 comments

Description

Fixes #4360 #3515.

This is a WIP pull request to act as a basis for further discussions regarding improvements to queries. See #4360 for more details. There may be more issues that have to do with this functionality that I am unaware of, so feel free to link any relevant discussions.

What's been done at this point:

[x] Faster regex queries, beet list album::album
- Added db function that does matching directly
[x] Faster album path queries, beet list -a path::some/path
- Track-level metadata is queried directly through a join
[x] Faster flexible attributes queries, both albums and tracks, beet list play_count:10
- The corresponding attribute table is also immediately accessible through a join
[x] (Fast) Ability to list albums with track-level (and vice-versa) db field queries, beet list -a title:something
- items and albums tables are joined so that data is available for filtering
[x] (Fast) Ability to list tracks with album-level flexible field queries, beet list artpath:cover
- Since track data contains album_id, it's fairly straightforward to join album_attributes into track queries
[x] Pass current tests
- Except 3 lslimit plugin tests that have to do with the '<' query prefix. I had a look at its implementation and found that it's fairly fragile and complicated to adjust. Therefore, for now I added a flag -l --limit which accepts a value for the limit. See 734883fa. If it looks sensible I reckon it could be moved to another PR.
- bareasc and playlist plugins got slightly adjusted to enable fast db-level queries as well

To Do

[x] (Fast) Ability to list albums with track-level flexible field queries, beet list -a title:something
- It's kind of done but it caused a couple of tests to fail, so I left it out for now
[ ] Documentation. (If you've add a new command-line flag, for example, find the appropriate page under docs/ to describe it.)
[ ] Changelog. (Add an entry to docs/changelog.rst near the top of the document.)
[x] Tests. (Encouraged but not strictly required.)

Jun 05 '22 12:06 snejus

@wisp3rwind Your comments have been addressed. I have also added some explanatory comments to the _fetch method, hopefully they will help to clarify what's going on in there!

Furthermore, now albums can be filtered using track-level queries.

Jun 30 '22 08:06 snejus

Included a few simple tests as well. Except for the changelog entry, the functionality is complete at this point, so I'm marking the PR as ready for review. @sampsyo we'll need your eyes on this :)

Jun 30 '22 11:06 snejus

Hi @snejus, looking great! I have essentially no time to review this (or anything for beets) at the moment, so it will take me some time to get around to it. It's on top of my beets todo list!

Jul 06 '22 22:07 wisp3rwind

For now, I've only looked at the lslimit changes. It's great to get rid of the brittle implementation in the HeadQuery and use SQL LIMIT instead, we should absolutely do this!

How exactly are the tests failing?

If we can't fix them easily by changes local to the limit plugin, I think we should still resolve this issue in one way or the other now: Otherwise, we risk to "temporarily" disable the < query prefix, and never add it again. Two approaches could be

What you did here, but make it permanent instead of considering it to be a stopgap solution: That is, remove the < query prefix entirely, and add a --limit argument to relevant commands. The downside is that we'd usually want to mark the prefix as deprecated for a while and issue a warning, before finally removing it. Maybe, we could at least turn the HeadQuery into what is essentially a TrueQuery, but displays a warning on first invocation? Doing this would still require appropriate communication in the changelog, etc., since we'd nevertheless have to remove the prefix on short notice. Would that be acceptable?
We come up with a way to keep the < prefix working, which will involve supporting it in query parsing in dbcore.queryparse.parse_sorted_query. This function would need to return something like (query, sort, limit) instead of the current (query, sort). One question is whether to enable this by default, or still use the limit "plugin" to enable this prefix, in order to keep exact backwards compatibility.

@sampsyo (or anyone!) do you have any strong opinion here? This is only about commit 1 and 3 (734883fa9af0320b1f00bc97a286a3d3da8df803 and b9666ee9977d148ba6c467388414b21b9380d3d2) of this PR.

In any case, would it be possible for you (once we have decided what to do exactly) to split this into a separate preparatory PR, and rebase this PR on top of it? I think that would simplify review a bit.

Jul 10 '22 20:07 wisp3rwind

Hi @wisp3rwind, thanks for your response! See below for the failed tests:

FAILED test/test_limit.py::LimitPluginTest::test_prefix - AssertionError: 10 != 5
FAILED test/test_limit.py::LimitPluginTest::test_prefix_when_correctly_ordered - AssertionError: 10 != 5
FAILED test/test_limit.py::LimitPluginTest::test_prefix_when_incorrectly_ordred - AssertionError: 10 != 0

Let me know if you want to know the specifics of why it is the case.

Regarding your suggestions,

This would probably not be ideal, though luckily this plugin has not been released yet - it got added a month later, after 1.6.0 got released, therefore a deprecation notice may not be needed. However, we also know that a fair bit of users install the master branch, so this change would be relevant to them.
I'd be very happy to go ahead with this solution. The implementation, as you suggested, is fairly straightforward. From the user's perspective, ability to limit the results using the same list command is great. Personally, my main use case is
- I find that a query yields too many results
- Press up
- Append -l10
- Press enter

Some users may have already gotten used to be using the prefix, so supporting it going forward makes sense.

Regarding the lslimit command, we could maybe print a notice like Limiting is now supported in the list command, please use it instead before we release the next beets version. We could remove this when it's about to be released, since limit functionality didn't exist in 1.6.0 anyways.

Let me know your thoughts.

To confirm, I'll be happy to open a separate pull request for this.

Jul 12 '22 08:07 snejus

@snejus This looks useful for something that I using. If I am using something like query = MatchQuery("plex_ratingkey", track.ratingKey, fast=False) to match on the flexible attribute plex_ratingkey, will it work as is? Right now, the query is slow, which I am hoping your PR seeks to address.

Aug 08 '22 14:08 arsaboo

@arsaboo it should indeed work.

You may as well try checking out this PR and attempt to, say, list tracks using your filter: beet ls plex_ratingkey:xyz. I'd be happy to hear whether there's any change in performance :)

Aug 13 '22 05:08 snejus

Alright, sorry for the long silence, but I'll have a little more time in the next weeks :)

Let me know if you want to know the specifics of why it is the case.

Ah, ok, the entire FieldQuery.match -> *Query.value_match` code path is dead now, isn't it, since it all pushed down to SQL?

I had completely missed that lslimit has not been released; so let's just go ahead, remove it and add the functionality to beet's core :)

A deprecation notice on loading the plugin would indeed be nice. We already have (at least) two plugins that do so; see gmusic.py and freedesktop.py.

There's in fact a rationale for keeping the query prefix, see the docs https://github.com/beetbox/beets/pull/4190/files What's more, LIMITing properly would also solve the issue that's mentioned there about query ordering.

About the implementation, a few initial thoughts:

--limit/--head: Easy enough, just factor out the limit keyword argument and implementation from this PR.
--tail: Seems more complicated: This makes little sense unless a sort other than the default NullSort is used. Then, we would need to reverse the sort in the SQL, LIMIT the results, and then reverse the result set again. Reversing the SQL seems somewhat tricky: maybe it could be done by adding a reverse kwargs to dbcore.query.*Sort.order_clause. Or, is there an easier option to formulate this directly in SQL? @patrick-nicholson how important is the --tail command to your usecase? Maybe, it would be an option not to implement it for now to limit the complexity of the PR.
the query prefix: I guess we would need a notion of "query modifiers" or "query options", such as a limit, that would be returned by dbcore.queryparse.parse_sorted_query.

Aug 19 '22 19:08 wisp3rwind

A semi-related issue is https://github.com/beetbox/beets/issues/750

Aug 19 '22 19:08 wisp3rwind

@wisp3rwind I don't think --tail is all that necessary (at least on *nix) if sorting is correct, --head results can be reversed.

Aug 19 '22 22:08 patrick-nicholson

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Dec 21 '22 00:12 stale[bot]

Thanks for a reminder, will try reviving it in the coming days.

Dec 21 '22 01:12 snejus

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Oct 15 '23 16:10 stale[bot]

We should keep this active

Oct 15 '23 18:10 RollingStar

I've been intending to come back to this but have been too busy recently. This PR crosses my mind every weekend and I know I will be coming back to this soon :)

Oct 17 '23 09:10 snejus

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Mar 17 '24 13:03 stale[bot]

We should keep this active....

Mar 17 '24 14:03 arsaboo

beets beets copied to clipboard

Querying improvements / removing 'slow' queries

Description

To Do

beets
beets copied to clipboard