dejavu icon indicating copy to clipboard operation
dejavu copied to clipboard

If I have 40K songs , how to fingerprint ?

Open monyoudom opened this issue 7 years ago • 5 comments

Should I fingerprint 1000 songs per a period of time. Or fingerprint 40K only one time ?

monyoudom avatar Nov 22 '17 14:11 monyoudom

@monyoudom , did you ever come up with a good plan of attack on this? I too have about 35k songs I'd like to fingerprint. Thanks!

chefboyrc9 avatar Feb 12 '19 19:02 chefboyrc9

I have added Flask to the project and implemented background tasks with Celery and Redis. For fingerprinting i am using the fingerprint_file method. Flask -> A route that receives a FormData Post containing the file. This route initiates the Celery Task and returns an endpoint to query the status of the task.


@bp.route('/fingerprint', methods=['POST'])
def fingerprint():

  # check if the post request has the audio part
  if ('audio' not in request.form) or ('title' not in request.form) or ('songId' not in request.form) or ('artist' not in request.form):
    return bad_request('Audio to fingerprint is missing.')
  else:
    audio = request.form['audio']
    title = request.form['title']
    songId = request.form['songId']
    artist = request.form['artist']

    task = fingerprint_audio.delay(audio, title, artist, songId, True)
    return jsonify({}, 202, {'Location': url_for('api.taskstatus',task_id=task.id)})

Celery Task -> Calls the fingerprinting method with the file suplied by Flask endpoint and returns Success or Error upon finish.


@celery.task(bind=True)
def fingerprint_audio(self, file_path, title, artist, songId, remote=False):
  with open("dejavu.cnf.SAMPLE") as f:
    config = json.load(f)

  # create a Dejavu instance
  djv = Dejavu(config)
  fpath = ''
  
  # Fingerprint all the mp3's in the directory we give it
  if remote == True:
    self.update_state(state='STARTED',
                    meta={'title': title, 'artist': artist,
                          'songId': songId, 'status': "Downloading Audio"})

    r = requests.get(file_path)
    with open(os.path.join(str(Path.home()), f"echoprint/test/{songId}.mp3"), 'wb') as f:
      f.write(r.content)
      fpath = os.path.join(str(Path.home()), f"echoprint/test/{songId}.mp3")
  else:
    fpath = file_path


  self.update_state(state='STARTED',
                    meta={'title': title, 'artist': artist,
                          'songId': songId, 'status': "Fingerprinting started"})
  djv.fingerprint_file(fpath, title, artist, songId)
  
  requests.post(f"https://pg-app-54z4vdmoj6h9q4d1ux0vecfdjw7lc2.scalabl.cloud/webhook/dejavu/{songId}")
  try:
    os.remove(fpath)
  except:
    pass
  return {'title': title, 'artist': artist, 'status': 'Fingerprinting completed',
          'songId': songId}

Using it: My songs are hosted on s3 so basically i call the fingerprinting endpoint on a loop sending 50 song urls each time from node.

My Digital Ocean Dropplet uses 4GB memory with 2vCPUs My Celery config uses --concurrency=4 --time-limit=420 (These were the optimum configs in my case)

On each 50 tasks celery fails 5 average (with ffmpeg could not decode error).

gitteraz avatar Oct 09 '19 10:10 gitteraz

@gitteraz am interested how you achieved this

Stashare avatar Jul 25 '20 16:07 Stashare

@gitteraz am interested how you achieved this

Yes he did a good job

monyoudom avatar Aug 07 '20 08:08 monyoudom

@gitteraz did you try matching against a large database?

raedatoui avatar Aug 13 '20 03:08 raedatoui