audiobookshelf
audiobookshelf copied to clipboard
[Bug]: Match & Out of Print Audible Titles
What happened?
If you match (or worse, quick match) an Audible title that has been reissued or discontinued some "bad things" happen:
If it has been reissued: All information (including reader if it changed) will be updated to reflect the reissue.
If it hasn't been reissued and isn't available any longer if you're lucky nothing will happen but if the title matches too much you'll get a completely different book wiping out all of the information about that book.
If you're doing a match, you can catch these before the "damage" is done. With a quick match you will end up doing a lot of fixing.
What did you expect to happen?
The software needs to make some better decisions.
If matching and the ASIN doesn't match, do not change any records.
If the ASIN isn't present in the current audiobookshelf record; double check title & author (and series if applicable).
Steps to reproduce the issue
This is a bit difficult to reproduce as you would have to own a discontinued or revised titles.
I will give examples for both cases.
In 2013 Audible released a science-fiction audiobook by Michael McCollum titled "The Clouds of Saturn". (ASIN: B00B89GKI8). Doing a quick match or match in audiobookshelf has it match with a book called "Saturn's Story" by Jill Logan. (ASIN: B0DYFR8CYC)
In 2017 Audible released "Cold Welcome" by Elizabeth Moon (ASIN: B06Y2H1WCT) with narrator Brittany Pressley. In 2024 this book was reissued with a new recording (ASIN B0D8517NWS) with narrator Carrie Coello. No matter what the ASIN and release year are set in audiobookshelf, it will switch the meta information to the reissue.
Audiobookshelf version
v2.21.0
How are you running audiobookshelf?
Docker
What OS is your Audiobookshelf server hosted from?
Other (list in "Additional Notes" box)
If the issue is being seen in the UI, what browsers are you seeing the problem on?
None
Logs
Additional Notes
Synology host (Debian variant)
This is really only an issue with quick match, because add you said the user can validate any changes using a normal match.
Quick Match uses the first result from the metadata provider you selected. If it doesn't exist, then you will get "random" data.
Related to https://github.com/advplyr/audiobookshelf/issues/841
Yeah, for some time I've been wanting to implement some match confidence score that estimates how likely it is that a match result is the right one. It doesn't seem like a hard thing to achieve - obviously, the best clue is the duration difference (which is why matches are ordered by duration diff), and you can of course add additional signals.
After you have that score, the question becomes what to do when you quick match and the match confidence score is below some threshold. There are a number of possible options, but at the minimum, you can just reject the match, which shouldn't be hard to implement.
Does this sound like good approach? I can take a stab at implementing this.
I think match confidence is a good idea.
It would be easier to write the match confidence algorithm to work for any provider including custom providers if the providers were standardized. I've wanted to standardize providers for a while. It is a bigger project so it wouldn't need to be a part of the match confidence update.
Plenty of users organize their ebooks in Abs which is a lot harder to match confidently without duration.
As far as the UI/UX for this, I like the suggestion of allowing you to accept the results of the quick match before saving (in #841). I'm not sure what that would look like but my initial thought is a dialog opens up (similar to the series inner dialog) showing the results of the quick match, confidence score, and the list of values that will be updated.
The reason I like that better than updating the inputs in the edit modal is it will allow us to put information about the match, like match confidence score, and it will be easier to show the user which fields will be updated.
I think match confidence is a good idea.
It would be easier to write the match confidence algorithm to work for any provider including custom providers if the providers were standardized. I've wanted to standardize providers for a while. It is a bigger project so it wouldn't need to be a part of the match confidence update.
I can try to write a default one using existing metadata from providers. They are already mostly returning roughly the same fields, aren't they? I'll need to look at the code.
Plenty of users organize their ebooks in Abs which is a lot harder to match confidently without duration.
Not sure if that's much harder. You can get a lot of metadata from the book itself (assuming it's an epub or pdf), especially the publisher which can distinguish between editions.
My thought was to at least initially separate between ebooks and audiobooks. The match confidence algorithm would be somewhat different.
As far as the UI/UX for this, I like the suggestion of allowing you to accept the results of the quick match before saving (in #841). I'm not sure what that would look like but my initial thought is a dialog opens up (similar to the series inner dialog) showing the results of the quick match, confidence score, and the list of values that will be updated.
Sure. And we also need to deal with bulk matching, which makes this a little more complicated.
The reason I like that better than updating the inputs in the edit modal is it will allow us to put information about the match, like match confidence score, and it will be easier to show the user which fields will be updated.
I can try to write a default one using existing metadata from providers. They are already mostly returning roughly the same fields, aren't they? I'll need to look at the code.
Yeah mostly the same but it is messy.
My thought was to at least initially separate between ebooks and audiobooks. The match confidence algorithm would be somewhat different.
Makes sense
Sure. And we also need to deal with bulk matching, which makes this a little more complicated.
Yeah we should leave the confirmation step out of bulk matching for now.
There are 2 types of bulk matching. Bulk matching the entire library and bulk matching selected items.
Bulk matching the entire library won't make sense to build a confirmation step for. Maybe we add a library setting in the future to set a cut off on how low the confidence score can be for the match to be accepted.
There are feature requests open for locking items from being able to get changed, locking individual fields on items, and locking specific fields across the whole library. Those types of features could be helpful for bulk matching
This all seems to be a step in the right direction.