dejavu
dejavu copied to clipboard
Performance Tweaks, Iterators, and Lazy Evaluations
A new function, dejavu.logic.decoder.find_files_g
, has been created as an iterator replacement for find_files
in the same file. This allows the updated Dejavu.fingerprint_directory
method to utilize the concurrent.futures.ProcessPoolExecutor
in conjunction with the concurrent.futures.as_completed
function to submit files to be processed as they are yielded for immediate processing. Once all of the files have been submitted to the executor for processing, their respective results will be iterated over in as_completed
...as they are completed.
These two modifications will allow considerable speed improvements over the existing methods. If anyone knows of where to find a large number of Creative Common openly licensed audio files to download and test on, I would be glad to post comparison results. I tried finding some today but every website was either broken, required creating accounts, or was so extensively rate-limited to be near useless.
Other minor improvements are
- Adding placeholder
songs
andsonghashes_set
in the__init__
ofDejavu
to take advantage of__init__
's special dict - Changing
counts
andsongs_matches
inDejavu.align_matches
to generator comprehensions - Adding
song_hash
directly toself.songhashes_set
instead of creating the variable first. Saves a lookup per iteration - Waiting to call
Dejavu.__load_fingerprinted_audio_hashes()
until after all files have been processed - Changing both
channels
indejavu.logic.decoder.read
to use list comprehensions