Audible.com-Search-by-Album icon indicating copy to clipboard operation
Audible.com-Search-by-Album copied to clipboard

Scrape audible author ID

Open buswedg opened this issue 2 years ago • 8 comments

Would it be possible/ worth capturing the audible author identifier? I don't see it in the current query scope at the title level.

For context -- I'm working towards a folder structure where the first level folders include the first author (full) name, but want to include an id alongside names to keep folders unique.

Will admit, I'm not overly familiar with Audibles metadata. But it does look like they have a 10 digit string for all authors in their db.

For example, I see Michelle Obama has an id of B07B436TLF -- https://www.audible.com/author/Michelle-Obama/B07B436TLF

buswedg avatar Nov 23 '22 18:11 buswedg

Interesting, I did some digging in a few books page source and it looks like there is a datalayer section at the very bottom which has Author ID. I only checked two books and it was there in both, but I have no idea how consistent it is.

This shouldn't be hard to scrape. What ID3 tag should be put this ID in? I see a couple options; WWWARTIST, MUSICBRAINZ_ALBUMARTISTID, or a custom tag (AudibleAlbumArtistID, or AAAI). https://docs.mp3tag.de/mapping-table/

I hesitate to use the Mbz tag since the ID is from Audible not Mbz. The WWW tag is a good option. The custom tag would have the best description but no other program would know to read it (does that even matter?). Maybe we could consult with the Audiobookshelf team, but I don't know if having this ID tag would even be beneficial for them.

I'm also curious, how do you plan on handling books with multiple authors? Authors like J.N. Chaney frequently have multiple authors, and sometimes his name is listed first, sometimes second or third. Do we just pull the first author listed?

seanap avatar Nov 23 '22 19:11 seanap

I made the executive decision to put the ID in a custom tag called "AUDIBLE_ALBUMARTISTID" to keep it consistent with the Mbz tag (still up for debate, let me know if anyone has a better idea). I've only tested this on a few books, but it seems to work pretty good.

Please test this on a variety of books and report back any issues. Then once we're happy that it's good enough I will merge with the main script.

Download the new .src script here https://github.com/seanap/Audible.com-Search-by-Album/blob/master/Audible.com%23Search%20by%20Album%20-%20BETA.src

seanap avatar Nov 23 '22 20:11 seanap

Here's a format string that does what you're looking for: Z:\temp\TEST\audiobooks\%albumartist%[ '['%audible_albumartistid%']']\%series%\%year% - %album% [ '['%series% %series-part%']']\%album% (%year%)[ '['%series% %series-part%']']$ifgreater(%_total_files%,1, - pt$num(%track%,2),)

seanap avatar Nov 24 '22 00:11 seanap

Amazing -- thanks for the quick response. I'll give this a shot tomorrow.

buswedg avatar Nov 24 '22 01:11 buswedg

Here's a format string that does what you're looking for: Z:\temp\TEST\audiobooks\%albumartist%[ '['%audible_albumartistid%']']\%series%\%year% - %album% [ '['%series% %series-part%']']\%album% (%year%)[ '['%series% %series-part%']']$ifgreater(%_total_files%,1, - pt$num(%track%,2),)

%albumartist% may include more than one author however? I thought I saw some instances on that earlier today when I was playing with this. Is the albumartistID for only the first author, delimited by a comma?

buswedg avatar Nov 24 '22 01:11 buswedg

This will only grab the ID of whichever author Audible lists first. A folder would look like /Author1, Author2 [IDof1]/...

seanap avatar Nov 24 '22 01:11 seanap

ok, I just took a quick look at some page sources. It looks like the authors is just a list of dics. It should have both the first author name and their ID in the first pair. Which is where you're pulling the author id from anyway. I'll make a new custom field similar to AUDIBLE_PRIMARYARTIST and maybe rename your custom id field to AUDIBLE_PRIMARYARTISTID.

On a separate note, I started taking a look at your beets.io fork this morning, as I'll need an automated solution here. But have some suggestions on search priority using ASIN (if already available in tag or filename). I think this will also improve results. But I'll make a separate issue there in time.

buswedg avatar Nov 24 '22 02:11 buswedg

You might want to update to something like the below to pull both the first author's name and id. I tested on a bunch of books this morning, and all looks fine.

findline "product:[{"
findinline "{\"fullName\":\"" 1 1
outputto "AUDIBLE_FIRSTARTIST"
sayuntil "\""
findinline "\"id\":\""
outputto "AUDIBLE_FIRSTARTISTID"
sayuntil "\""

I'd say scraping the api.audnex.us endpoint would be the more sustainable solution however. Same as the beets audible plugin. No doubt, it'll remain more stable than the source of audibles audiobook summary pages. And I see that API also includes first author asin as part of their spec.

buswedg avatar Nov 25 '22 03:11 buswedg