mutagen icon indicating copy to clipboard operation
mutagen copied to clipboard

Documentation: Missing md5_signature attribute under the API Reference

Open tonurics opened this issue 5 years ago • 6 comments

API Reference / FLAC / mutagen.flac.StreamInfo should include an entry for md5_signature which is the MD5 fingerprint in the FLAC streaminfo.

It might also be helpful to note: that the MD5 value being returned by mutagen is in Base10. The user will need to convert it to the more common Base16 representation [i.e. md5=hex(*.md5_signature).split('x')[-1]], if they want a value to match the output of metaflac --show-md5sum and other tools.

tonurics avatar Jul 21 '20 22:07 tonurics

What do you need it for?

lazka avatar Jul 22 '20 05:07 lazka

There are numerous use cases for wanting to read the MD5 signature stored the FLAC StreamInfo. The value is for the PCM samples prior to encoding.

For clarity: I'm not asking for a md5_signature attribute to be added to mutagen. It is already part of the mutagen FLAC reader and has been since at least 2005. I'm pointing out: the current documentation has omitted it.

tonurics avatar Jul 22 '20 06:07 tonurics

Yeah, asking since I left it undocumented on purpose because I considered it "internal" when I added the API docs. It's not going away though.

lazka avatar Jul 22 '20 06:07 lazka

If it gets officially exposed in some ways IMHO it should be done with Base16 representation

phw avatar Jul 22 '20 06:07 phw

Ah. The documentation omission created a bit of work for me earlier. I initially thought: mutagen didn't/couldn't read it and set about writing my own solution. A lucky Google search, then lead me to the undocumented attribute.

I am using mutagen in a personal project: where a database is populated with tag/meta data. Since all my files are encoded with the stock FLAC encoder and well-formed: I'm using the MD5 signature to quickly assist in detecting duplicates. [I use FFmpeg to generate SHA160 hashes of the actual PCM data; not yet trusting the stored MD5 in destructive functions.]

I agree Base16 representation is much more useful [I've added my own logic to convert the value]. But perhaps, given the 15 year precedent: instead of changing the md5_signature attribute, it would cause less pain if a new attribute named md5_fingerprint was added. That could return Base16 and bring the naming convention in line with what Xiph uses.

tonurics avatar Jul 22 '20 07:07 tonurics

What do you need it for?

It's extremely useful for being able to maintain a database of md5 signatures which can be used to check file integrity and to detect duplicate files and albums independent of metadata. Please re-include it in the documentation.

audiomuze avatar Mar 25 '23 14:03 audiomuze