dejavu
dejavu copied to clipboard
Split fingerprinting
test_fingerprint_by_splitting.py creates long file from the existing files in mp3/.
to fingerprint with a length check you should use djv.fingerprint_with_duration_check(long_song, song_name="Concatenates_test") as shown in the split-test file
this is the updated version of the previous pull request it should consider verification checks with the wavio situation there is no chance for me to test that.
repost of PR https://github.com/worldveil/dejavu/pull/75 for issue https://github.com/worldveil/dejavu/issues/18
You don't consider the offset_seconds.
@wangzhengyi :+1: I would request code comments and recommendations. Edit1: ~~the new function is based on all existing~~ ~~For me it is enough the use of offset_seconds to happen in there~~ Got It. What is needed is to calculate and add the previous lengths... any proper suggestions are welcomed
Any news on merging this with the master branch? dejavu is almost unusable on low memory machines - even the example mp3 files give out of memory errors when trying to fingerprint on a 512MB machine :( (and relying on swap is a disaster on this machine - its only storage is a memory card). Thanks
The solution that worked for me to get the offsets correct is to (A) extract the offset (in seconds) as defined by the split file name (ex. start_sec60_end_sec120.mp3), (B) convert the seconds value to the equivalent sampling offset value, and (C) add the derived sampling offset value to the offset as determined by the fingerprinting process for the given file.
Note: I am using a different fork so some of the smaller details may be different ex. database.py naming.
(A) Extract Offset Data & (B) Extract Offset Data
# __init__.py
def _fingerprint_worker(filename, limit=None, song_name=None):
...
channel_amount = len(channels)
# Get Offset from name.
try:
first_split = song_name.split("start_sec", 1)
select_second = first_split[1]
second_split = select_second.split("_end_sec", 1)
# Convert second_split[0] to sampling offset
split_offset = round(
int(second_split[0]) * fingerprint.DEFAULT_FS /
fingerprint.DEFAULT_WINDOW_SIZE / fingerprint.DEFAULT_OVERLAP_RATIO,
5
)
except:
split_offset = 0
...
return song_name, result, file_hash, split_offset
Iterate and Pass the Value
# __init__.py
while True:
try:
song_name, hashes, file_hash, split_offset = next(iterator)
...
else:
#sid = self.db.insert_song(song_name, file_hash) # REMOVE
if treat_as_split and song_name_for_the_split:
sid = self.db.insert_song(song_name_for_the_split, file_hash)
if not treat_as_split:
sid = self.db.insert_song(song_name, file_hash)
self.db.insert_hashes(sid, hashes, split_offset)
...
(C) Apply Offset
# database.py
def insert_hashes(self, sid, hashes, split_offset=0):
...
for hash, offset in set(hashes):
fingerprints.append(
Fingerprint(
hash=binascii.unhexlify(hash),
song_id=sid,
offset=int(offset+split_offset)
)
)
self.session.bulk_save_objects(fingerprints)