open_clip icon indicating copy to clipboard operation
open_clip copied to clipboard

Speed up model.generate() with coca?

Open Pclanglais opened this issue 1 year ago • 4 comments

I am building an image classification workflow on top of coca captions and embeddings. The only downside is that this is slow (about 100/images per minute on a google colab).

So two related questions:

  • Is it possible to extract the embeddings calculated within model.generate()? Currently I use encode_image on top which is basically a duplicate.
  • Are there some settings that may speed up model.generate at the expense of accuracy? In my current workflow I only need the top characteristic words from the captions of images that belong to the same cluster. I'm not entirely clear how beamsearch work.

Pclanglais avatar Mar 26 '23 19:03 Pclanglais

@Pclanglais Hi, I will get to work on 1 as soon as I can, as it is not possible right away. For 2 did you try setting generation_type="top_p" inside .generate? That should be faster and also allow for more control over the generation setting the "top_p" argument correctly.

gpucce avatar Mar 27 '23 14:03 gpucce

hello @gpucce Thanks a lot. For 1. I just wanted to be sure that I hadn't missed any option but I could fork it on my side. It's a very good idea for 2 : I'm going to test it right away.

Pclanglais avatar Mar 27 '23 14:03 Pclanglais

Duplicate of https://github.com/mlfoundations/open_clip/issues/409 But let's keep both

This is an important issue to fix for usability

rom1504 avatar Apr 10 '23 08:04 rom1504

@Pclanglais Maybe a bit late, but if you aren't batching yet you can try #498. When I try replicating your findings, assuming GPU, I'm getting around a 100 images processed in around 40 seconds when batch size is 1. You can already batch with model.generate(), however I hoped to make a easier for future use in the PR.

sramshetty avatar Apr 18 '23 04:04 sramshetty