open_clip
open_clip copied to clipboard
Speed up model.generate() with coca?
I am building an image classification workflow on top of coca captions and embeddings. The only downside is that this is slow (about 100/images per minute on a google colab).
So two related questions:
- Is it possible to extract the embeddings calculated within model.generate()? Currently I use encode_image on top which is basically a duplicate.
- Are there some settings that may speed up model.generate at the expense of accuracy? In my current workflow I only need the top characteristic words from the captions of images that belong to the same cluster. I'm not entirely clear how beamsearch work.
@Pclanglais Hi, I will get to work on 1 as soon as I can, as it is not possible right away. For 2 did you try setting generation_type="top_p"
inside .generate
? That should be faster and also allow for more control over the generation setting the "top_p" argument correctly.
hello @gpucce Thanks a lot. For 1. I just wanted to be sure that I hadn't missed any option but I could fork it on my side. It's a very good idea for 2 : I'm going to test it right away.
Duplicate of https://github.com/mlfoundations/open_clip/issues/409 But let's keep both
This is an important issue to fix for usability
@Pclanglais Maybe a bit late, but if you aren't batching yet you can try #498. When I try replicating your findings, assuming GPU, I'm getting around a 100 images processed in around 40 seconds when batch size is 1. You can already batch with model.generate(), however I hoped to make a easier for future use in the PR.