logohunter Optimizing Algorithm

Optimizing Algorithm

Open makkarss929 opened this issue 3 years ago • 1 comments

There is a total of 871 brands , 25000 brand images. So It takes a lot of time approx 10 minutes to compute cosine similarities(0.95 similarity threshold) for 10 - 12 input images. And for 1 input image, it also takes the same time.
Inception feature extractor with 25000 brand images, is not working due to memory error(on 16 GB RAM ). but it works on 12000 brand images.
VGG feature extractor works with 25000 brand images.

All experiments I have done on AWS c2.2xlarge with 16 GB RAM.

Question is that, How we can optimize or minimize computation cost

Aug 23 '21 15:08 makkarss929

hey, sorry haven't been around this repo a lot in the past years:

the goal of the cosine similarity step is to get a threshold above which to declare a match between a candidate logo and your input brand logo. So you're correct that it doesn't matter how many images you're testing. My idea was that this is a step you only do once for a given brand (e.g. you can save that threshold somewhere else and pass it to the match_logo function), so it didn't need to be optimized. Still, to speed it up you could sample the input brands, e.g. 5k instead of 25k: you would have higher variance in the cosine similarities, but it might be ok 2-3. In theory any CNN/image-classification type of model could work at this step. If you have memory constraints a smaller model like VGG might still be good enough (you'll have to evaluate your own False Positive / False Negative rates on your test data). Or you could truncate Inception a bit more to effectively load a smaller model, for example check out the flavor arg to load_extractor_model in src/utils.py#L36.

Jul 06 '23 06:07 ilmonteux