dejavu icon indicating copy to clipboard operation
dejavu copied to clipboard

Split fingerprinting

Open thesunlover opened this issue 9 years ago • 6 comments

test_fingerprint_by_splitting.py creates long file from the existing files in mp3/.

to fingerprint with a length check you should use djv.fingerprint_with_duration_check(long_song, song_name="Concatenates_test") as shown in the split-test file

thesunlover avatar Aug 05 '15 17:08 thesunlover

this is the updated version of the previous pull request it should consider verification checks with the wavio situation there is no chance for me to test that.

thesunlover avatar Aug 05 '15 17:08 thesunlover

repost of PR https://github.com/worldveil/dejavu/pull/75 for issue https://github.com/worldveil/dejavu/issues/18

thesunlover avatar Aug 05 '15 17:08 thesunlover

You don't consider the offset_seconds.

wangzhengyi avatar Aug 19 '15 08:08 wangzhengyi

@wangzhengyi :+1: I would request code comments and recommendations. Edit1: ~~the new function is based on all existing~~ ~~For me it is enough the use of offset_seconds to happen in there~~ Got It. What is needed is to calculate and add the previous lengths... any proper suggestions are welcomed

thesunlover avatar Sep 01 '15 13:09 thesunlover

Any news on merging this with the master branch? dejavu is almost unusable on low memory machines - even the example mp3 files give out of memory errors when trying to fingerprint on a 512MB machine :( (and relying on swap is a disaster on this machine - its only storage is a memory card). Thanks

sheffieldnikki avatar Aug 03 '16 20:08 sheffieldnikki

The solution that worked for me to get the offsets correct is to (A) extract the offset (in seconds) as defined by the split file name (ex. start_sec60_end_sec120.mp3), (B) convert the seconds value to the equivalent sampling offset value, and (C) add the derived sampling offset value to the offset as determined by the fingerprinting process for the given file.

Note: I am using a different fork so some of the smaller details may be different ex. database.py naming.

(A) Extract Offset Data & (B) Extract Offset Data

# __init__.py

def _fingerprint_worker(filename, limit=None, song_name=None):
   ...
   channel_amount = len(channels)

   # Get Offset from name.
   try:
       first_split = song_name.split("start_sec", 1)
       select_second = first_split[1]
       second_split = select_second.split("_end_sec", 1)
       
       # Convert second_split[0] to sampling offset
       split_offset = round(
           int(second_split[0]) * fingerprint.DEFAULT_FS /
           fingerprint.DEFAULT_WINDOW_SIZE / fingerprint.DEFAULT_OVERLAP_RATIO,
           5
       )
   except:
       split_offset = 0
    ...
    return song_name, result, file_hash, split_offset

Iterate and Pass the Value

# __init__.py

while True:
            try:
                song_name, hashes, file_hash, split_offset = next(iterator)
            ...
            else:
                #sid = self.db.insert_song(song_name, file_hash) # REMOVE
                if treat_as_split and song_name_for_the_split:
                    sid = self.db.insert_song(song_name_for_the_split, file_hash)
                if not treat_as_split:
                    sid = self.db.insert_song(song_name, file_hash)               

                self.db.insert_hashes(sid, hashes, split_offset)
                ...

(C) Apply Offset

# database.py

    def insert_hashes(self, sid, hashes, split_offset=0):
        ...
        for hash, offset in set(hashes):
            fingerprints.append(
                Fingerprint(
                    hash=binascii.unhexlify(hash),
                    song_id=sid,
                    offset=int(offset+split_offset)
                )
            )
        self.session.bulk_save_objects(fingerprints)

NathanielCustom avatar Mar 31 '19 18:03 NathanielCustom