beets
beets copied to clipboard
Fetch work title from MusicBrainz
Hi! I'm new to beets. I have a big collection of classical music that is in the MB database. For classical music a straightforward way of ordering it is to order it first by composer, then by work and then by performer. MusicBrainz has a relation "recording of" that relates a recording to a work. A work can have a relation "part of" that refers either to another work or to a catalogue. So the works are organised as a tree with the work-work relation "part of" or "part" as a link. Could it be possible with beets to "climb up" the tree, so that I can create a folder with the name of the "parent work" that contains the recordings of parts of the work?
As an example, if we look at this recording of the Matthäus-Passion of Bach: https://musicbrainz.org/release/51d08afb-0d81-4617-8c33-979806910ddf there every recording has a tag "recording of" followed by a work. This work is on his side part of the work "Matthäus-Passion, BWV 244: Teil ?" which is part of the work "Matthäus-Passion, BWV 244" which is part of the catalogue "Bach-Werke-Verzeichnis" with the number BWV 244. Could it be possible to go up the ladder to the uppermost work "Matthäus-Passion, BWV 244" and optionally (not every composer has a complete catalogue, and not every work on MB is linked appropriately to the corresponding catalogue) to the catalog number? this would give:
Music/Bach, Johann Sebastian/Matthäus-Passion, BWV 244/Performer/St. Matthew Passion
(I choose sort names for artists, else I get stuck with russian, japanese and other names that I can't read for Suzuki, Prokofiev etc.) and here the next problems appear: who do we choose as the performer? An obvious choice would be the recording artist, but this has 2 problems: in many cases, the recording artist is just the composer because noone updated it. In other cases, as in this release, the recording artist changes in every track, because the choir for example doesn't sing in every track. A solution might be to just list all performers linked to recordings in this album that are linked to the work. This seems to be the best solution to me. However, this doesn't work if we have, for example, a "best of" of the best performances of a given work, but that is pretty rare (even if I have an example in my collection). Maybe there is a better solution.
Is there a way to implement this into beets?
Dorian
maybe if a given performer has made several recordings of the same work, add the date of the recording.
and if there is no performer or recording artist, use the release artist
Ok, sounds intriguing! How would you propose to expose the "parent work" information? Would we just record the title, or other stuff too? Do you have ideas for the names of the fields that should hold this information?
Hi! Thanks for your reply. I would say the title and the disambiguation (for example for arrangements like https://musicbrainz.org/work/51bb8773-8492-3773-ab88-73a89c922c3d ). The composer information would be saved on a different tag. I don't know how beets is made, but I would imagine adding 2 tags, for "work" and "composer" and then have some routine that can call from the musicbrainz database the "parent works" and the "performers". Or maybe even add an additional tag for "parent work". This kind of data organisation is relevant especially for classical music, for more modern stuff as far as I can tell the music is more rarely organised in several movements of a bigger piece.
OK, so just to summarize, you'd like to be able to access these fields in path templates, right?
$work: This would be the title of the directly associated work.$composer: I think we already have this one, as of https://github.com/beetbox/beets/pull/2333.$parentwork: The title of the parent work?$parentwork_composer: Would this be relevant too?
Ideally, $parentwork_composer would be relevant. Practically, I expect it to always be the $composer but for the sake of completeness, if it doesn't include a lot of work, I would take it as well.
Another question: is there a way to use sort names for artists (and therefore composers) with beets?
OK, thanks. But just to be clear, $composer already does what you want, right?
If so, I'll make this ticket into a request to get the work title field. Then, as a second stage, we can consider doing the "parent work" thing to get copies of the relevant fields reflecting the parent work.
Yes, we do fetch artist_sort and albumartist_sort from MusicBrainz. If you're ever wondering about this kind of a thing, you can type beet fields to get a complete list.
I never tried it out, but from what I can read on the conversation, yes. It even uses the "arranger" tag, which can be helpful as well. As I said, I'm new to beets, I'm just importing my collection right now.
OK, cool. Marked this as a feature request for that first stage.
Thanks!
Hi! I have seen that you added "composer" as a field, could it be possible to also make a field "composer_sort" (and "arranger_sort" by the way) or does this field contain it already?
Hi, @dosoe—that sounds like a separate feature request. Maybe it deserves a separate GitHub thread?
Hi! After having fun with 'composer_sort', 'arranger_sort' and 'lyricist_sort' (to be submitted) I'm trying to get the 'work' and 'parent_work' settled. For this I tried out something: I added this in the track_info function of beets.beets.autotag.mb.py
for work_relation in recording.get('work-relation-list', ()):
if work_relation['type'] != 'performance':
continue
work.append(work_relation['work']['title'])
gives the title of the work
for parent_work_relation_1 in work_relation['work'].get('work-relation-list',()):
if parent_work_relation_1['type'] != 'parts':
continue
parent_work_1.append(parent_work_relation_1['work']['title'])
gives the title of the work the initial one is part of
Now however when I try to do the same to get the parent work of parent_work_1, I get nothing: parent_work_relation_1['work'].get('work-relation-list',()) gives an empty list, even if the parent_work_1 effectively is part of a work (tested with the work being https://musicbrainz.org/work/2fb76aa1-b37f-3e05-a185-f7e607efaf80 and choosing a recording of the work I have in my collection).
What seems to be happening is that beets is going on the page of the recording and takes out all the information he can out of this. In this case (https://musicbrainz.org/recording/546c4659-96c0-46ee-9b31-cfe3e78a1c48 for testing) it is: the work and the parent_work_1 and other tags such as composer etc. So would there be a way to not go on the page of the recording but on the page of the work using something similar to the track_url and album_url functions that are defined (but I don't know how and where they are used) since it is easy for a work to fetch its id (work_relation['work']['id'] with everything defined as above) and from there to get to its url: def work_url(workid): return urljoin(BASE_URL, 'work/' + workid)
Now I don't know where and how you use the album_url and track_url functions to get to the actual recording infos but you probably do.
Right now I'm going up the ladder of parent works by hand but once I get this working to rather do a 'while' loop to climb up to the top of the ladder, but only once I manage to do it this way.
Other question: How can I submit a merge (for adding 'arranger_sort' and lyricist_sort' tags) while continuing to advance on this 'work' stuff? I also have some issues with the 'arranger' and 'lyricist' tags exposed in https://github.com/beetbox/beets/pull/2333
Hmm… to summarize, it sounds like you're interested in how to query the MusicBrainz web service for a specific work ID? For that, you go through the client library and use, for example, get_work_by_id. In general, you might want to read a little bit about the MusicBrainz Web service. For example, here's the URL for the recording you mentioned with its work relations included: https://musicbrainz.org/ws/2/recording/546c4659-96c0-46ee-9b31-cfe3e78a1c48?inc=work-rels
About the other question: to submit a new PR, the thing to do is to put your work in a branch and push it to your fork. Then you can open several PRs at once; one for each branch.
I will read about the MB web service, it sounds like it could answer some of my questions (but not today). I wonder if it could even be better to make a work_info function like the track_info function in beets.autotag.mb.py. This could also fetch the composer, lyricist and more generally the tags that MB associates to works.
About the other question: So the idea is that I have more than one fork of beets on my repo, right?
Sure, a work_info function would be OK—but it would need to look different from the track_info function. The latter produces a complete TrackInfo object, and there would not be a corresponding WorkInfo object (because there is no such thing as a "work" in the beets database). It would need to build up information to put on the TrackInfo object.
Furthermore, it would be something of a problem if we needed to issue a series of new MusicBrainz API requests to get the work data. Is it possible to pull all the information out of the work-rels included data? If not, we may need to make this data an optional feature to avoid making metadata fetching take much longer than it does currently.
No, there's no need to fork the repo twice—you can just create different branches within one git repository. (The GitHub help pages can be useful for this.)
But can't we make a WorkInfo object and just not put it into the library? Just use it as a temporary variable.
Sure! But I'd argue that you'd probably be better of with just a plain dict instead.
Yes, I would be very satisfied with that.
Ok thanks to get_work_by_id I did the necessary to fetch the work title, the work disambiguation, the parent work title, the parent work disambiguation, the parent work composer name and the parent work composer sort name. However, I already have a pull request (https://github.com/beetbox/beets/pull/2563) so if I just push it on my repo it will be added as a commit to this one. Additionally, I only implemented the fetching part (in beet.autotag.mb.py) and not all the stuff around. However, I can show you what the code looks like: Just insert it into the track_info function:
lyricist = []
composer = []
composer_sort = []
work = []
work_disambig = []
parent_work = []
parent_work_disambig = []
parent_composer = []
parent_composer_sort = []
for work_relation in recording.get('work-relation-list', ()):
if work_relation['type'] != 'performance':
continue
work_id=work_relation['work']['id']
work_info=musicbrainzngs.get_work_by_id(work_id, includes=["work-rels","artist-rels"])
work.append(work_info['work']['title'])
try:
work_disambig.append(work_info['work']['disambiguation'])
parent_disambig_tmp=work_info['work']['disambiguation']
except KeyError:
work_disambig.append('')
parent_disambig_tmp=''
partof=True
parent_work_tmp=work_info['work']['title']
while partof:
partof=False
for work_father in work_info['work']['work-relation-list']:
if work_father['type'] == 'parts':
try:
if work_father['direction'] == 'backward':
father_id=work_father['work']['id']
partof=True
work_info=musicbrainzngs.get_work_by_id(father_id, includes=["work-rels","artist-rels"])
parent_work_tmp=work_info['work']['title']
try:
parent_disambig_tmp=work_info['work']['disambiguation']
except KeyError:
parent_disambig_tmp=''
except KeyError:
pass
for artist in work_info['work']['artist-relation-list']:
if artist['type']=='composer':
parent_composer.append(artist['artist']['name'])
parent_composer_sort.append(artist['artist']['sort-name'])
parent_work.append(parent_work_tmp)
parent_work_disambig.append(parent_disambig_tmp)
instead of
lyricist = []
composer = []
composer_sort = []
for work_relation in recording.get('work-relation-list', ()):
if work_relation['type'] != 'performance':
continue
I guess there also should a 'parent_lyricist' and 'parent_lyricist_sort' tag, but that is easy and quick to do. If the work is not part of a bigger one, the parent_work is the work itself (same for all the 'parent_' tags) What I assume there is that a work only has one parent, which might not always be the case. There are probably style errors, but it works for me.
I will need more time to sort out how to choose the performer correctly.
This calls the musicbrainzngs.get_work_by_id function multiple times, so there might be an issue because we can only go on the server once a second. Additionally, I lately had the problem that a significant proportion of my test runs gave a 503 error (service unavailable).
Hmm; that's interesting! I notice that this seems to have gotten quite a bit more complicated. It seems like it would be a worthy goal to see if this can be done in a more generic way: that is, maybe we can write one function to get all the information for the "parent work," and then a separate function that pulls out all the work-related information from any work? Then, we can just join "parent_" onto the front of all the stuff from the parent work in one fell swoop, rather than needing to duplicate logic for every field.
Most of this code (the while loop) only aims to find the parent work
Am 25.05.2017 7:16 nachm. schrieb "Adrian Sampson" <[email protected]
:
Hmm; that's interesting! I notice that this seems to have gotten quite a bit more complicated. It seems like it would be a worthy goal to see if this can be done in a more generic way: that is, maybe we can write one function to get all the information for the "parent work," and then a separate function that pulls out all the work-related information from any work? Then, we can just join "parent_" onto the front of all the stuff from the parent work in one fell swoop, rather than needing to duplicate logic for every field.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/beetbox/beets/issues/2452#issuecomment-304067568, or mute the thread https://github.com/notifications/unsubscribe-auth/AYyAOHpUpJP6zvpa6HL3_QeE7l89nYCkks5r9bdigaJpZM4MKK9X .
And then you get all the info about it once you have it
Am 25.05.2017 9:20 nachm. schrieb "Dorian Soergel" <[email protected]
:
Most of this code (the while loop) only aims to find the parent work
Am 25.05.2017 7:16 nachm. schrieb "Adrian Sampson" < [email protected]>:
Hmm; that's interesting! I notice that this seems to have gotten quite a bit more complicated. It seems like it would be a worthy goal to see if this can be done in a more generic way: that is, maybe we can write one function to get all the information for the "parent work," and then a separate function that pulls out all the work-related information from any work? Then, we can just join "parent_" onto the front of all the stuff from the parent work in one fell swoop, rather than needing to duplicate logic for every field.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/beetbox/beets/issues/2452#issuecomment-304067568, or mute the thread https://github.com/notifications/unsubscribe-auth/AYyAOHpUpJP6zvpa6HL3_QeE7l89nYCkks5r9bdigaJpZM4MKK9X .
So basically what is happening is the following: I get the work id with the recording relationships. So I have a work id Then I get the work relationships by using ger_work_by_id in the work relationships I look for a work that is of 'type': 'part' and of 'direction': 'backward' . If there is none, then this is the parent work, if there is one I take its id. Then I repeat with the id I just got.
Once we have the parent work, we take its name, composer, composer_sort, lyricist, etc. The disambiguation needs a try/except syntax because some works don't have a disambiguation. The same way, only works with a parent have a work with a 'direction' tag in their work-relationships. We should also watch out for dupes, so maybe append the tag to the list only if it's not already in the list, since a recording can very well contain more than many works but all are part of the same parent work.
It may be coded clumsily, but it works.
Now there are two issues: -first, there might be several parents of one work. I believe that would be an error in the MB database but maybe there are good reasons for this to happen. I don't know so far how to deal with this. -second, each call of get_work_by_id is a call of MB and I can only do one a second. This would therefore substantially slow down the autotagger (I guess) so maybe it would be a good idea to make it optional or in a plugin (as far as I can tell, this is useful only for classical music and if there were many classical lovers here this would already have been implemented). I have no idea how to do this.
Yeah, making a plugin would be a great way to make the extra queries optional and encapsulate the new code! It's actually fairly straightforward: the beets plugin system has an "import stage" API, where you can add arbitrary code to run on music that's been imported. So an easy way to get started would be to make a plugin that just runs this same code in an import stage, making calls into the beets.autotag.mb module.
Let me know if I can help more with pointing the way!
Yes, I would appreciate that if you could help me to set it up.
Sure! Here's the place to start: http://docs.beets.io/en/v1.4.3/dev/plugins.html
Feel free to post questions along the way if anything comes up.
Ok, now I have a start file using https://beets.readthedocs.io/en/v1.4.3/dev/plugins.html#add-path-format-functions-and-fields and the keyfinder plugin as a template. However, I don't know how to add a new tag: on the keyfinder plugin they just have a tag mapping and write it directly to the file, but that doesn't work for me. Additionally, I don't know how to tell him to do it also when importing and updating. I'm attaching the code how he is so far. The part about fetching the data from MB works afaict, even if it is a little ugly (I'm not a programmer).
here an updated version. It works (when I ask him to print the data, it is correct) , I just don't get how to write the tags into the library. I made a branch for it, but I would like to commit this stuff without modifying my other pull request https://github.com/beetbox/beets/pull/2563 . parentwork.txt