logohunter
logohunter copied to clipboard
Optimizing Algorithm
-
There is a total of 871 brands , 25000 brand images. So It takes a lot of time approx 10 minutes to compute cosine similarities(0.95 similarity threshold) for 10 - 12 input images. And for 1 input image, it also takes the same time.
-
Inception feature extractor with 25000 brand images, is not working due to memory error(on 16 GB RAM ). but it works on 12000 brand images.
-
VGG feature extractor works with 25000 brand images.
All experiments I have done on AWS c2.2xlarge with 16 GB RAM.
Question is that, How we can optimize or minimize computation cost
hey, sorry haven't been around this repo a lot in the past years:
- the goal of the cosine similarity step is to get a threshold above which to declare a match between a candidate logo and your input brand logo. So you're correct that it doesn't matter how many images you're testing. My idea was that this is a step you only do once for a given brand (e.g. you can save that threshold somewhere else and pass it to the
match_logo
function), so it didn't need to be optimized. Still, to speed it up you could sample the input brands, e.g. 5k instead of 25k: you would have higher variance in the cosine similarities, but it might be ok 2-3. In theory any CNN/image-classification type of model could work at this step. If you have memory constraints a smaller model like VGG might still be good enough (you'll have to evaluate your own False Positive / False Negative rates on your test data). Or you could truncate Inception a bit more to effectively load a smaller model, for example check out theflavor
arg toload_extractor_model
in src/utils.py#L36.