Documentation: Missing md5_signature attribute under the API Reference
API Reference / FLAC / mutagen.flac.StreamInfo should include an entry for md5_signature which is the MD5 fingerprint in the FLAC streaminfo.
It might also be helpful to note: that the MD5 value being returned by mutagen is in Base10. The user will need to convert it to the more common Base16 representation [i.e. md5=hex(*.md5_signature).split('x')[-1]], if they want a value to match the output of metaflac --show-md5sum and other tools.
What do you need it for?
There are numerous use cases for wanting to read the MD5 signature stored the FLAC StreamInfo. The value is for the PCM samples prior to encoding.
For clarity: I'm not asking for a md5_signature attribute to be added to mutagen. It is already part of the mutagen FLAC reader and has been since at least 2005. I'm pointing out: the current documentation has omitted it.
Yeah, asking since I left it undocumented on purpose because I considered it "internal" when I added the API docs. It's not going away though.
If it gets officially exposed in some ways IMHO it should be done with Base16 representation
Ah. The documentation omission created a bit of work for me earlier. I initially thought: mutagen didn't/couldn't read it and set about writing my own solution. A lucky Google search, then lead me to the undocumented attribute.
I am using mutagen in a personal project: where a database is populated with tag/meta data. Since all my files are encoded with the stock FLAC encoder and well-formed: I'm using the MD5 signature to quickly assist in detecting duplicates. [I use FFmpeg to generate SHA160 hashes of the actual PCM data; not yet trusting the stored MD5 in destructive functions.]
I agree Base16 representation is much more useful [I've added my own logic to convert the value]. But perhaps, given the 15 year precedent: instead of changing the md5_signature attribute, it would cause less pain if a new attribute named md5_fingerprint was added. That could return Base16 and bring the naming convention in line with what Xiph uses.
What do you need it for?
It's extremely useful for being able to maintain a database of md5 signatures which can be used to check file integrity and to detect duplicate files and albums independent of metadata. Please re-include it in the documentation.