SongRec icon indicating copy to clipboard operation
SongRec copied to clipboard

Consider moving fingerprinting to different crate licensed as MIT

Open qarmin opened this issue 4 years ago • 1 comments

Hi, Recently I searched for any music hash/fingerprint crate, but I couldn't find any.

Looks that src/fingerprinting/algorithm.rs is more or less solution to my problems, but I can't use them directly due to license problems(my project is licensed under MIT but this code is GPL).

Also looks that this code returns data only needed by Shazam, but I need something continuous hash e.g. [u8;8] to be able check Hamming distance between two hashes to get informatiion if this music are similar or not.

I think that should looks similar to img_hash crate which produce 64 bits(or 256 etc.) hashes from images - https://crates.io/crates/img_hash

qarmin avatar Dec 03 '20 07:12 qarmin

Hello,

Consider moving fingerprinting to different crate

The purpose of this project is to provide interoperability between the Shazam services and Linux systems, which purpose is specifically authorized by the French law and jurisprudence for importing foreign algorithms in free software (it is used as a ground for developing VLC, etc.).

However, if I would export it in a separate crate, I think that it would require to repurpose it to virtually whatsoever.

I can't use them directly due to license problems(my project is licensed under MIT but this code is GPL).

I think that you may not necessarily want to use it directly, as it is very specific with being compatible with Shazam. However, you can find observe a few specific traits that are common to most audio fingerprinting algorithms in the wild:

  • A sliding-window Fast Fourier Transform (STFT) which is basically the most common way to compute an audio spectrogram.
  • A very basic algorithm to find audio peaks in the spectrogram through iterating at neighbor values (a spectrogram being a chart having X values being time, Y values being frequency, with amplitude at intersections)
  • Some basic arithmetic to make fit the (time, frequency, amplitude) chunks into serializable values, here 16-bit integer (division, multiplication, logarithm...)

I think that should looks similar to img_hash crate which produce 64 bits(or 256 etc.) hashes from images - https://crates.io/crates/img_hash

I think that there's a misconception on what is an audio hash here: that's generally not a fixed-size value that can be compared with an Hamming distance to other values, but rather a variable sequence of peaks. Mainly because:

  • Audio can be cut to various length in time (contrary to images, which can be downsized to an unique size),
  • Audio can be heavily distorted in frequency, which will affect the value of features,
  • Audio can be heavily distorted in amplitude, which will affect the value of features, as well as possibly the number of detected peaks.

Except maybe if you just want to detect whole files that have been transcoded from a fixed window at their center...

Hence the interest of storing peaks or relative pairs of peaks individually in a database having a graph/tree structure and comparing these individually, as I think most audio fingerprinting algorithms do.

Regards,

marin-m avatar Dec 03 '20 08:12 marin-m