audible-cli icon indicating copy to clipboard operation
audible-cli copied to clipboard

Fix badly encoded characters in metadata

Open vwkd opened this issue 1 year ago • 1 comments

This is an attempt to fix badly encoded characters in the AAX/C metadata. I'm not sure what encoding the AAX/C format uses for metadata and what badly encoded characters Audible has throughout its library. Hence this is currently a limited "find and replace" for those characters I've encountered. Please feel free to add more if you find them.

Specifically, this currently fixes:

  • in "copyright":
    • copyright character © from some doubly escaped HTML entity &\#169\; potentially followed by null .
    • producer string (P) (does it mean that?) from some unknown escape \;(P) with missing or multiple preceding whitespaces.

EDIT: This doesn't seem to work yet. Not sure why. The updated metadata should be written to the temporary metadata file. I suspected that ffmpeg, the .m4b format or the file metadata doesn't support Unicode, but one of my audiobooks has the correct copyright character already, which suggests this should not be the issue.

vwkd avatar Jan 10 '24 16:01 vwkd

The metadata are written back using utf-8. Maybe this is the wrong encoding. I'll check these and report back.

FYI: The metadata extracted using ffmpeg does not contain the full metadata. If you compare the output from ffmpeg -i {AAXC-FILE} -f ffmetadata meta-ffmetadata.txt and mediainfo {AAXC-FILE} > meta-mediainfo.txt you can see the difference.

mkb79 avatar Jan 10 '24 19:01 mkb79

I haven’t figured out how to make it work yet.

Also, I’m thinking it may not be a good idea to start fixing Audible‘s mistakes as it will lead to ever increasing complexity without fixing the root cause. Instead, the right thing is to report the mistakes to Audible until they implement a fix at the source. Alternatively, one might choose to accept these imperfections of Audible.

vwkd avatar Apr 09 '24 22:04 vwkd