fma icon indicating copy to clipboard operation
fma copied to clipboard

erroneous ID3 tag info

Open ejhumphrey opened this issue 5 years ago • 1 comments

I'm not sure if this relates to #4, but I've found that at least sox (on debian!) tries to parse out file duration using the reported bit-rate. Unfortunately for me, the reported bitrate is way wrong for at least ≈90 tracks (of the 100k+), and probably wrong for another couple hundred ... these particularly bad tracks claim to have bitrates in excess of "100M", which sox (at least) parses as bits per second. I'd point out that stereo 16bit wav is 1.4Mbps.

The list of suspicious file IDs is here, if anyone wants to double-check / confirm ... the extension is txt, but it's JSON formatted, key point to sox-reported bitrate.

More fortunately, removing all the ID3 tags fixes the issue. I'd propose perhaps exporting all ID3 tags to a static dump over the collection (per #4), and then removing all the ID3 tags to sanitize the collection.

ejhumphrey avatar Jul 18 '18 20:07 ejhumphrey

Thanks for the investigation @ejhumphrey.

I'd propose perhaps exporting all ID3 tags to a static dump over the collection (per #4), and then removing all the ID3 tags to sanitize the collection.

Seems like a good solution. Do you know of any other metadata that should be cleaned or removed to sanitize such audio collection?

mdeff avatar Jul 20 '20 21:07 mdeff