beets icon indicating copy to clipboard operation
beets copied to clipboard

Fetch work title from MusicBrainz

Open dosoe opened this issue 8 years ago • 34 comments

Hi! I'm new to beets. I have a big collection of classical music that is in the MB database. For classical music a straightforward way of ordering it is to order it first by composer, then by work and then by performer. MusicBrainz has a relation "recording of" that relates a recording to a work. A work can have a relation "part of" that refers either to another work or to a catalogue. So the works are organised as a tree with the work-work relation "part of" or "part" as a link. Could it be possible with beets to "climb up" the tree, so that I can create a folder with the name of the "parent work" that contains the recordings of parts of the work?

As an example, if we look at this recording of the Matthäus-Passion of Bach: https://musicbrainz.org/release/51d08afb-0d81-4617-8c33-979806910ddf there every recording has a tag "recording of" followed by a work. This work is on his side part of the work "Matthäus-Passion, BWV 244: Teil ?" which is part of the work "Matthäus-Passion, BWV 244" which is part of the catalogue "Bach-Werke-Verzeichnis" with the number BWV 244. Could it be possible to go up the ladder to the uppermost work "Matthäus-Passion, BWV 244" and optionally (not every composer has a complete catalogue, and not every work on MB is linked appropriately to the corresponding catalogue) to the catalog number? this would give:

Music/Bach, Johann Sebastian/Matthäus-Passion, BWV 244/Performer/St. Matthew Passion

(I choose sort names for artists, else I get stuck with russian, japanese and other names that I can't read for Suzuki, Prokofiev etc.) and here the next problems appear: who do we choose as the performer? An obvious choice would be the recording artist, but this has 2 problems: in many cases, the recording artist is just the composer because noone updated it. In other cases, as in this release, the recording artist changes in every track, because the choir for example doesn't sing in every track. A solution might be to just list all performers linked to recordings in this album that are linked to the work. This seems to be the best solution to me. However, this doesn't work if we have, for example, a "best of" of the best performances of a given work, but that is pretty rare (even if I have an example in my collection). Maybe there is a better solution.

Is there a way to implement this into beets?

Dorian

dosoe avatar Feb 23 '17 16:02 dosoe

maybe if a given performer has made several recordings of the same work, add the date of the recording.

dosoe avatar Feb 23 '17 16:02 dosoe

and if there is no performer or recording artist, use the release artist

dosoe avatar Feb 23 '17 16:02 dosoe

Ok, sounds intriguing! How would you propose to expose the "parent work" information? Would we just record the title, or other stuff too? Do you have ideas for the names of the fields that should hold this information?

sampsyo avatar Feb 24 '17 04:02 sampsyo

Hi! Thanks for your reply. I would say the title and the disambiguation (for example for arrangements like https://musicbrainz.org/work/51bb8773-8492-3773-ab88-73a89c922c3d ). The composer information would be saved on a different tag. I don't know how beets is made, but I would imagine adding 2 tags, for "work" and "composer" and then have some routine that can call from the musicbrainz database the "parent works" and the "performers". Or maybe even add an additional tag for "parent work". This kind of data organisation is relevant especially for classical music, for more modern stuff as far as I can tell the music is more rarely organised in several movements of a bigger piece.

dosoe avatar Feb 25 '17 20:02 dosoe

OK, so just to summarize, you'd like to be able to access these fields in path templates, right?

  • $work: This would be the title of the directly associated work.
  • $composer: I think we already have this one, as of https://github.com/beetbox/beets/pull/2333.
  • $parentwork: The title of the parent work?
  • $parentwork_composer: Would this be relevant too?

sampsyo avatar Feb 25 '17 20:02 sampsyo

Ideally, $parentwork_composer would be relevant. Practically, I expect it to always be the $composer but for the sake of completeness, if it doesn't include a lot of work, I would take it as well.

dosoe avatar Feb 25 '17 20:02 dosoe

Another question: is there a way to use sort names for artists (and therefore composers) with beets?

dosoe avatar Feb 25 '17 20:02 dosoe

OK, thanks. But just to be clear, $composer already does what you want, right?

If so, I'll make this ticket into a request to get the work title field. Then, as a second stage, we can consider doing the "parent work" thing to get copies of the relevant fields reflecting the parent work.

Yes, we do fetch artist_sort and albumartist_sort from MusicBrainz. If you're ever wondering about this kind of a thing, you can type beet fields to get a complete list.

sampsyo avatar Feb 25 '17 20:02 sampsyo

I never tried it out, but from what I can read on the conversation, yes. It even uses the "arranger" tag, which can be helpful as well. As I said, I'm new to beets, I'm just importing my collection right now.

dosoe avatar Feb 25 '17 20:02 dosoe

OK, cool. Marked this as a feature request for that first stage.

sampsyo avatar Feb 25 '17 20:02 sampsyo

Thanks!

dosoe avatar Feb 25 '17 21:02 dosoe

Hi! I have seen that you added "composer" as a field, could it be possible to also make a field "composer_sort" (and "arranger_sort" by the way) or does this field contain it already?

dosoe avatar Apr 14 '17 14:04 dosoe

Hi, @dosoe—that sounds like a separate feature request. Maybe it deserves a separate GitHub thread?

sampsyo avatar Apr 15 '17 15:04 sampsyo

Hi! After having fun with 'composer_sort', 'arranger_sort' and 'lyricist_sort' (to be submitted) I'm trying to get the 'work' and 'parent_work' settled. For this I tried out something: I added this in the track_info function of beets.beets.autotag.mb.py

    for work_relation in recording.get('work-relation-list', ()):
        if work_relation['type'] != 'performance':
            continue
        work.append(work_relation['work']['title'])

gives the title of the work

            for parent_work_relation_1 in work_relation['work'].get('work-relation-list',()):
                if parent_work_relation_1['type'] != 'parts':
                    continue
                parent_work_1.append(parent_work_relation_1['work']['title'])

gives the title of the work the initial one is part of

Now however when I try to do the same to get the parent work of parent_work_1, I get nothing: parent_work_relation_1['work'].get('work-relation-list',()) gives an empty list, even if the parent_work_1 effectively is part of a work (tested with the work being https://musicbrainz.org/work/2fb76aa1-b37f-3e05-a185-f7e607efaf80 and choosing a recording of the work I have in my collection).

What seems to be happening is that beets is going on the page of the recording and takes out all the information he can out of this. In this case (https://musicbrainz.org/recording/546c4659-96c0-46ee-9b31-cfe3e78a1c48 for testing) it is: the work and the parent_work_1 and other tags such as composer etc. So would there be a way to not go on the page of the recording but on the page of the work using something similar to the track_url and album_url functions that are defined (but I don't know how and where they are used) since it is easy for a work to fetch its id (work_relation['work']['id'] with everything defined as above) and from there to get to its url: def work_url(workid): return urljoin(BASE_URL, 'work/' + workid)

Now I don't know where and how you use the album_url and track_url functions to get to the actual recording infos but you probably do.

Right now I'm going up the ladder of parent works by hand but once I get this working to rather do a 'while' loop to climb up to the top of the ladder, but only once I manage to do it this way.

Other question: How can I submit a merge (for adding 'arranger_sort' and lyricist_sort' tags) while continuing to advance on this 'work' stuff? I also have some issues with the 'arranger' and 'lyricist' tags exposed in https://github.com/beetbox/beets/pull/2333

dosoe avatar May 15 '17 22:05 dosoe

Hmm… to summarize, it sounds like you're interested in how to query the MusicBrainz web service for a specific work ID? For that, you go through the client library and use, for example, get_work_by_id. In general, you might want to read a little bit about the MusicBrainz Web service. For example, here's the URL for the recording you mentioned with its work relations included: https://musicbrainz.org/ws/2/recording/546c4659-96c0-46ee-9b31-cfe3e78a1c48?inc=work-rels

About the other question: to submit a new PR, the thing to do is to put your work in a branch and push it to your fork. Then you can open several PRs at once; one for each branch.

sampsyo avatar May 16 '17 15:05 sampsyo

I will read about the MB web service, it sounds like it could answer some of my questions (but not today). I wonder if it could even be better to make a work_info function like the track_info function in beets.autotag.mb.py. This could also fetch the composer, lyricist and more generally the tags that MB associates to works.

About the other question: So the idea is that I have more than one fork of beets on my repo, right?

dosoe avatar May 16 '17 16:05 dosoe

Sure, a work_info function would be OK—but it would need to look different from the track_info function. The latter produces a complete TrackInfo object, and there would not be a corresponding WorkInfo object (because there is no such thing as a "work" in the beets database). It would need to build up information to put on the TrackInfo object.

Furthermore, it would be something of a problem if we needed to issue a series of new MusicBrainz API requests to get the work data. Is it possible to pull all the information out of the work-rels included data? If not, we may need to make this data an optional feature to avoid making metadata fetching take much longer than it does currently.

No, there's no need to fork the repo twice—you can just create different branches within one git repository. (The GitHub help pages can be useful for this.)

sampsyo avatar May 16 '17 18:05 sampsyo

But can't we make a WorkInfo object and just not put it into the library? Just use it as a temporary variable.

dosoe avatar May 16 '17 19:05 dosoe

Sure! But I'd argue that you'd probably be better of with just a plain dict instead.

sampsyo avatar May 16 '17 19:05 sampsyo

Yes, I would be very satisfied with that.

dosoe avatar May 16 '17 19:05 dosoe

Ok thanks to get_work_by_id I did the necessary to fetch the work title, the work disambiguation, the parent work title, the parent work disambiguation, the parent work composer name and the parent work composer sort name. However, I already have a pull request (https://github.com/beetbox/beets/pull/2563) so if I just push it on my repo it will be added as a commit to this one. Additionally, I only implemented the fetching part (in beet.autotag.mb.py) and not all the stuff around. However, I can show you what the code looks like: Just insert it into the track_info function:

lyricist = []
composer = []
composer_sort = []
work = []
work_disambig = []
parent_work = []
parent_work_disambig = []
parent_composer = []
parent_composer_sort = []
for work_relation in recording.get('work-relation-list', ()):
    if work_relation['type'] != 'performance':
        continue
    work_id=work_relation['work']['id']
    work_info=musicbrainzngs.get_work_by_id(work_id, includes=["work-rels","artist-rels"])
    work.append(work_info['work']['title'])
    try:
        work_disambig.append(work_info['work']['disambiguation'])
        parent_disambig_tmp=work_info['work']['disambiguation']
    except KeyError:
        work_disambig.append('')
        parent_disambig_tmp=''
    partof=True
    parent_work_tmp=work_info['work']['title']
    while partof:
        partof=False
        for work_father in work_info['work']['work-relation-list']:
            if work_father['type'] == 'parts': 
                try: 
                    if work_father['direction'] == 'backward':
                        father_id=work_father['work']['id']
                        partof=True
                        work_info=musicbrainzngs.get_work_by_id(father_id, includes=["work-rels","artist-rels"])
                        parent_work_tmp=work_info['work']['title']
                        try:
                            parent_disambig_tmp=work_info['work']['disambiguation']
                        except KeyError:
                            parent_disambig_tmp=''
                except KeyError:
                    pass 
    for artist in work_info['work']['artist-relation-list']:
        if artist['type']=='composer':
            parent_composer.append(artist['artist']['name'])
            parent_composer_sort.append(artist['artist']['sort-name'])
    parent_work.append(parent_work_tmp)
    parent_work_disambig.append(parent_disambig_tmp)

instead of

lyricist = []
composer = []
composer_sort = []
for work_relation in recording.get('work-relation-list', ()):
    if work_relation['type'] != 'performance':
        continue

I guess there also should a 'parent_lyricist' and 'parent_lyricist_sort' tag, but that is easy and quick to do. If the work is not part of a bigger one, the parent_work is the work itself (same for all the 'parent_' tags) What I assume there is that a work only has one parent, which might not always be the case. There are probably style errors, but it works for me.

I will need more time to sort out how to choose the performer correctly.

This calls the musicbrainzngs.get_work_by_id function multiple times, so there might be an issue because we can only go on the server once a second. Additionally, I lately had the problem that a significant proportion of my test runs gave a 503 error (service unavailable).

dosoe avatar May 25 '17 15:05 dosoe

Hmm; that's interesting! I notice that this seems to have gotten quite a bit more complicated. It seems like it would be a worthy goal to see if this can be done in a more generic way: that is, maybe we can write one function to get all the information for the "parent work," and then a separate function that pulls out all the work-related information from any work? Then, we can just join "parent_" onto the front of all the stuff from the parent work in one fell swoop, rather than needing to duplicate logic for every field.

sampsyo avatar May 25 '17 17:05 sampsyo

Most of this code (the while loop) only aims to find the parent work

Am 25.05.2017 7:16 nachm. schrieb "Adrian Sampson" <[email protected]

:

Hmm; that's interesting! I notice that this seems to have gotten quite a bit more complicated. It seems like it would be a worthy goal to see if this can be done in a more generic way: that is, maybe we can write one function to get all the information for the "parent work," and then a separate function that pulls out all the work-related information from any work? Then, we can just join "parent_" onto the front of all the stuff from the parent work in one fell swoop, rather than needing to duplicate logic for every field.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/beetbox/beets/issues/2452#issuecomment-304067568, or mute the thread https://github.com/notifications/unsubscribe-auth/AYyAOHpUpJP6zvpa6HL3_QeE7l89nYCkks5r9bdigaJpZM4MKK9X .

dosoe avatar May 25 '17 19:05 dosoe

And then you get all the info about it once you have it

Am 25.05.2017 9:20 nachm. schrieb "Dorian Soergel" <[email protected]

:

Most of this code (the while loop) only aims to find the parent work

Am 25.05.2017 7:16 nachm. schrieb "Adrian Sampson" < [email protected]>:

Hmm; that's interesting! I notice that this seems to have gotten quite a bit more complicated. It seems like it would be a worthy goal to see if this can be done in a more generic way: that is, maybe we can write one function to get all the information for the "parent work," and then a separate function that pulls out all the work-related information from any work? Then, we can just join "parent_" onto the front of all the stuff from the parent work in one fell swoop, rather than needing to duplicate logic for every field.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/beetbox/beets/issues/2452#issuecomment-304067568, or mute the thread https://github.com/notifications/unsubscribe-auth/AYyAOHpUpJP6zvpa6HL3_QeE7l89nYCkks5r9bdigaJpZM4MKK9X .

dosoe avatar May 25 '17 19:05 dosoe

So basically what is happening is the following: I get the work id with the recording relationships. So I have a work id Then I get the work relationships by using ger_work_by_id in the work relationships I look for a work that is of 'type': 'part' and of 'direction': 'backward' . If there is none, then this is the parent work, if there is one I take its id. Then I repeat with the id I just got.

Once we have the parent work, we take its name, composer, composer_sort, lyricist, etc. The disambiguation needs a try/except syntax because some works don't have a disambiguation. The same way, only works with a parent have a work with a 'direction' tag in their work-relationships. We should also watch out for dupes, so maybe append the tag to the list only if it's not already in the list, since a recording can very well contain more than many works but all are part of the same parent work.

It may be coded clumsily, but it works.

Now there are two issues: -first, there might be several parents of one work. I believe that would be an error in the MB database but maybe there are good reasons for this to happen. I don't know so far how to deal with this. -second, each call of get_work_by_id is a call of MB and I can only do one a second. This would therefore substantially slow down the autotagger (I guess) so maybe it would be a good idea to make it optional or in a plugin (as far as I can tell, this is useful only for classical music and if there were many classical lovers here this would already have been implemented). I have no idea how to do this.

dosoe avatar May 25 '17 22:05 dosoe

Yeah, making a plugin would be a great way to make the extra queries optional and encapsulate the new code! It's actually fairly straightforward: the beets plugin system has an "import stage" API, where you can add arbitrary code to run on music that's been imported. So an easy way to get started would be to make a plugin that just runs this same code in an import stage, making calls into the beets.autotag.mb module.

Let me know if I can help more with pointing the way!

sampsyo avatar May 26 '17 03:05 sampsyo

Yes, I would appreciate that if you could help me to set it up.

dosoe avatar May 26 '17 10:05 dosoe

Sure! Here's the place to start: http://docs.beets.io/en/v1.4.3/dev/plugins.html

Feel free to post questions along the way if anything comes up.

sampsyo avatar May 26 '17 17:05 sampsyo

Ok, now I have a start file using https://beets.readthedocs.io/en/v1.4.3/dev/plugins.html#add-path-format-functions-and-fields and the keyfinder plugin as a template. However, I don't know how to add a new tag: on the keyfinder plugin they just have a tag mapping and write it directly to the file, but that doesn't work for me. Additionally, I don't know how to tell him to do it also when importing and updating. I'm attaching the code how he is so far. The part about fetching the data from MB works afaict, even if it is a little ugly (I'm not a programmer).

parentwork.txt

dosoe avatar May 31 '17 13:05 dosoe

here an updated version. It works (when I ask him to print the data, it is correct) , I just don't get how to write the tags into the library. I made a branch for it, but I would like to commit this stuff without modifying my other pull request https://github.com/beetbox/beets/pull/2563 . parentwork.txt

dosoe avatar May 31 '17 15:05 dosoe