dejavu
dejavu copied to clipboard
If I have 40K songs , how to fingerprint ?
Should I fingerprint 1000 songs per a period of time. Or fingerprint 40K only one time ?
@monyoudom , did you ever come up with a good plan of attack on this? I too have about 35k songs I'd like to fingerprint. Thanks!
I have added Flask to the project and implemented background tasks with Celery and Redis. For fingerprinting i am using the fingerprint_file method. Flask -> A route that receives a FormData Post containing the file. This route initiates the Celery Task and returns an endpoint to query the status of the task.
@bp.route('/fingerprint', methods=['POST'])
def fingerprint():
# check if the post request has the audio part
if ('audio' not in request.form) or ('title' not in request.form) or ('songId' not in request.form) or ('artist' not in request.form):
return bad_request('Audio to fingerprint is missing.')
else:
audio = request.form['audio']
title = request.form['title']
songId = request.form['songId']
artist = request.form['artist']
task = fingerprint_audio.delay(audio, title, artist, songId, True)
return jsonify({}, 202, {'Location': url_for('api.taskstatus',task_id=task.id)})
Celery Task -> Calls the fingerprinting method with the file suplied by Flask endpoint and returns Success or Error upon finish.
@celery.task(bind=True)
def fingerprint_audio(self, file_path, title, artist, songId, remote=False):
with open("dejavu.cnf.SAMPLE") as f:
config = json.load(f)
# create a Dejavu instance
djv = Dejavu(config)
fpath = ''
# Fingerprint all the mp3's in the directory we give it
if remote == True:
self.update_state(state='STARTED',
meta={'title': title, 'artist': artist,
'songId': songId, 'status': "Downloading Audio"})
r = requests.get(file_path)
with open(os.path.join(str(Path.home()), f"echoprint/test/{songId}.mp3"), 'wb') as f:
f.write(r.content)
fpath = os.path.join(str(Path.home()), f"echoprint/test/{songId}.mp3")
else:
fpath = file_path
self.update_state(state='STARTED',
meta={'title': title, 'artist': artist,
'songId': songId, 'status': "Fingerprinting started"})
djv.fingerprint_file(fpath, title, artist, songId)
requests.post(f"https://pg-app-54z4vdmoj6h9q4d1ux0vecfdjw7lc2.scalabl.cloud/webhook/dejavu/{songId}")
try:
os.remove(fpath)
except:
pass
return {'title': title, 'artist': artist, 'status': 'Fingerprinting completed',
'songId': songId}
Using it: My songs are hosted on s3 so basically i call the fingerprinting endpoint on a loop sending 50 song urls each time from node.
My Digital Ocean Dropplet uses 4GB memory with 2vCPUs My Celery config uses --concurrency=4 --time-limit=420 (These were the optimum configs in my case)
On each 50 tasks celery fails 5 average (with ffmpeg could not decode error).
@gitteraz am interested how you achieved this
@gitteraz am interested how you achieved this
Yes he did a good job
@gitteraz did you try matching against a large database?