audible-cli
audible-cli copied to clipboard
Fix badly encoded characters in metadata
This is an attempt to fix badly encoded characters in the AAX/C metadata. I'm not sure what encoding the AAX/C format uses for metadata and what badly encoded characters Audible has throughout its library. Hence this is currently a limited "find and replace" for those characters I've encountered. Please feel free to add more if you find them.
Specifically, this currently fixes:
- in "copyright":
- copyright character
©
from some doubly escaped HTML entity&\#169\;
potentially followed bynull
. - producer string
(P)
(does it mean that?) from some unknown escape\;(P)
with missing or multiple preceding whitespaces.
- copyright character
EDIT: This doesn't seem to work yet. Not sure why. The updated metadata should be written to the temporary metadata file. I suspected that ffmpeg, the .m4b format or the file metadata doesn't support Unicode, but one of my audiobooks has the correct copyright character already, which suggests this should not be the issue.
The metadata are written back using utf-8
. Maybe this is the wrong encoding. I'll check these and report back.
FYI:
The metadata extracted using ffmpeg does not contain the full metadata. If you compare the output from ffmpeg -i {AAXC-FILE} -f ffmetadata meta-ffmetadata.txt
and mediainfo {AAXC-FILE} > meta-mediainfo.txt
you can see the difference.
I haven’t figured out how to make it work yet.
Also, I’m thinking it may not be a good idea to start fixing Audible‘s mistakes as it will lead to ever increasing complexity without fixing the root cause. Instead, the right thing is to report the mistakes to Audible until they implement a fix at the source. Alternatively, one might choose to accept these imperfections of Audible.