Use SQL to query flex fields, and related Album/Item data
Description
This PR depends on #4741 and #4744 which convert some queries to be SQL-compatible.
In short, it enables filtering flexible attributes and related items or album through SQL. This speeds up flexible attributes lookup and enables queries like
beet list album_flex:something # list all items under the albums that have album_flex:something
beet list -a item_flex:something # list all albums which have at least one item with item_flex:something
This is largely the same logic that is behind #4362. I realized that the original PR ended up too complicated with too much of not-so-strictly-required refactorings which makes it hard to review. This PR skips the noise and contains only the bits that are required.
Note: I will be adding basic tests for the new behaviour within the next days. It'd be good to hear what sort of expectations people have here - happy to test them too!
Right before submitting the PR I stumbled upon a limitation: a query with more than one flexible attribute filter gives empty results.
-
Completely forgot to take this into account - the way
itemsand, sayitem_attributesare currently joined yields a row for eachitemsanditem_attributescombination, something like (simplified)items.iditem_attributes.keyitem_attributes.value1 skip_count10 1 play_count5 When we try to filter using a single flexible attribute, one of these rows will match (if found, of course) so it works as expected.
If we try to filter by two, we obviously fail. This can be tackled by (1) aggregating the attribtues or (2) performing
EXISTScheck within a subquery. Will test whichever is more performant and will add it in the next commit.
To Do
- [ ] Documentation. (If you've add a new command-line flag, for example, find the appropriate page under
docs/to describe it.) - [ ] Changelog. (Add an entry to
docs/changelog.rstnear the top of the document.) - [x] Tests. (Encouraged but not strictly required.)
Neato!! This is very exciting; I'm really glad we (as a project) are finally getting around to getting serious about making flexattr queries fast. When we built flexattr support long ago, I was pretty happy with the "slow query" mechanism as a way to make functionality progress without worrying about performance, but I always knew we'd want to come back and make them actually work in SQL through some combination of query cleverness and schema updates. We're doing that at last!
Thanks to your blog post which gave some nice pointers about what needs improving here! I do completely agree about what does not need improving here :)
Here's one somewhat low-level question: could we do this with a nested select instead of two SQL queries? SQLite supports expressions like item.id IN (SELECT ...). I haven't used this "subquery" functionality much so I don't know if it fits the bill, but it would be cool if we could avoid fetching all the IDs, stringifying them, and then feeding them back into the second-stage query syntax.
It may be possible - this would also offer an additional performance gain. I'll have a look how does it play together with having to query two flexible attribute tables! I guess those can be UNIONed. :)
@sampsyo see the new commits: aggregation ended up significantly simplifying the machinery behind the new queries - this now uses just one query.
We can now query multiple flex attributes!
The drawback is that we're losing the ability to query related flexible attributes - for example, the simplicity here does not allow beet list -a title:hello or beet list artpath:art_path. Though, I think for now this should be OK - we'd in any case need to have a wider discussion about how queries like beet list -a play_count:10 should work - do we here check play_count for every item, or do we sum/aggregate them?
There are some tests failing that have to do with the types - I'll have a deeper look at them later today/tomorrow.
That is awesome!!!!
Regarding the issue of "related" queries, it seems to me like these run in opposite directions:
beet list -a title:hello: Querying albums based on item metadata. I actually would have guessed that this currently would not work, unless there was an album out there with a flexattr calledtitle(which would be weird). But maybe I am misremembering what we support here?beet list artpath:art_path: Querying items based on album metadata. This actually seems more common, and we added support for doing stuff likebeet list albumartist:Beatlesin #2988. It would be great to avoid breaking this, although https://github.com/beetbox/beets/discussions/4742#discussioncomment-5556090 also points toward a larger change that would obviate this kind of "item-to-album fallback."
@sampsyo I apologize - I used completely wrong examples in my previous comment 🤦🏽♂️ see the clarification below
The drawback is that we're losing the ability to query related flexible attributes - for example, the simplicity here does not allow ~
beet list -a title:hello~beet list -a item_flex_attr:helloor ~beet list artpath:art_path~beet list album_flex_attr:hello.
In summary,
- ✅ related model field lookups are supported
- ✅
beet list -a title:hello - ✅
beet list artpath:hello
- ✅
- ⛔ related flex attr lookups are not supported
- ⛔
beet list -a item_flex_attr:hello - ⛔
beet list album_flex_attr:hello
- ⛔
Just had a couple of runs comparing master to this branch with various queries, see the results below

- Previously, the more filters were provided on the command line, the longer would the query take. Now, the behavior is the opposite - every argument either speeds up the query or keeps it the same.
- You can see
masterran-a title:avery quickly - that's because it didn't return anything.
The new queries have a more or less constant speed; the command speed is largely determined by the number of things that need printing. See the table below for the summary of timings and the numbers of matched results. You can see that albums query tends to be slightly quicker than items.

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@sampsyo would you still want this to be merged? I could try and rebase it off master
Improving flexible field queries would be huge... right now they are painfully slow for a large library.
Superseded by #5240