echoprint-codegen
echoprint-codegen copied to clipboard
Recognition accuracy when "listening" to recorded samples.
I made a simple website based on my red5 media server installation with flash audio recorder of client's microphone as a frontend and started to perform some tests.
On youtube we have a Red Hot Chilli Peppers song, which was really kind of popular back in the days and I think it got it's notorious music sign - it should be generally well known.
I tried to recognize the full FLV file from Youtube and I got some awesome results:
./lookup.py rhcp-youtube-all.flv
Got result: [<song - By The Way>]
Artist: Red Hot Chili Peppers (ARE8GLF1187FB52532)
Song: By The Way (SOVTWDI12A67020460)
This looks great, so I tried to record the same song (full as well) from my online flash recorder and let the resulting sound be recognized with Echoprint again:
./lookup.py rhcp-recorded-all.flv
Got result: []
No match. This track may not be in the database yet.
Snap.. Maybe the quality isn't sufficient enough? Well, I turned on Audacity and compared the frequency amplitudes. Here we have a picture of these two songs. I think there are similarities and patterns visible just with an eye.
RHCP - original
RHCP - recorded
So my issue - how is it that the accuracy isn't so accurate? Or how accurate echoprint actually is?
From what I understand of the echoprint matching so far, it works on a fairly granular level of audio comparison. High level features such as the overall song structure are not used, so it's hard to say exactly why the match is failing. You can get a lot more information by running your own Echoprint server and adding debug logging to it. I'm planning on releasing an alternative Echoprint server soon that should be much easier to deploy and debug if you are still working on this issue.
Although waveforms look very similar at this high level of visual inspection, this would not necessarily indicate a positive match by Echoprint.
Yes, Echoprint analyzes the signal on a short-time level. It splits the signal into blocks whose lengths are of the order of tens of milliseconds, finding onsets at this level of granularity, and then generating fingerprint codes by comparing the relative timing differences between neighboring onsets. It does not consider overall, longer time structure.
There are several reasons why the match could be failing. My first thought is that this could be due to several known, interrelated bugs in the way the matching algorithm works, which I am working on. Recently I started working with The Echo Nest to help them understand and fix these bugs and to get Echoprint working properly again. Apart from fixing the matching code as the first priority, our plans for after this include improving over-the-air (OTA) performance. We are hard at work addressing these issues and will have something soon for the community very soon.
I will start publishing changes to the source code shortly, once they have been tested properly. For example, we want to maintain backwards compatibility with already-generated fingerprint codes and ingested databases wherever possible.
Let me know if you are having any difficulties installing echoprint-server. Have you tried the local=True mode of operation? This obviates the requirement for Solr and TTyrant, but it is really only suitable for a relatively small number of audio files (several thousand), for testing purposes, and to help get people started with Echoprint. Also, if you have any particular features for logging or debugging which you would like to see added, or any other ideas you would like to share. I can then add them to our list of things to consider for the upcoming official release of Echoprint.
Andrew
@alnesbit Glad to hear you are actively working on this! I started working on an alternative implementation of the server (https://github.com/jhurliman/node-echoprint-server) that better fits our specific needs, but am happy to collaborate on improvements to the official echoprint-codegen / echoprint-server projects and contribute any improvements I make back to the official projects. One piece that I've found really helpful was a browser-friendly debugging endpoint that visualizes the fingerprint matching.
@xdnny if you upload both audio files somewhere I can try matching them in node-echoprint-server and post the results here. The output will look like: